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Assume that observations are generated from an infinite-order 
autoregressive [AR(oo)] process. Shibata [Ann. Statist. 8 (1980) 147- 
164] considered the problem of choosing a finite-order AR model, 
allowing the order to become infinite as the number of observations 
does in order to obtain a better approximation. He showed that, 
for the purpose of predicting the future of an independent replicate, 
Akaike's information criterion (AIC) and its variants are asymptot- 
ically efficient. Although Shibata's concept of asymptotic efficiency 
has been widely accepted in the literature, it is not a natural prop- 
erty for time series analysis. This is because when new observations 
of a time series become available, they are not independent of the 
previous data. To overcome this difficulty, in this paper we focus on 
order selection for forecasting the future of an observed time series, 
referred to as same-realization prediction. We present the first theo- 
retical verification that AIC and its variants are still asymptotically 
efficient (in the sense defined in Section 4) for same-realization pre- 
dictions. To obtain this result, a technical condition, easily met in 
common practice, is introduced to simplify the complicated depen- 
dent structures among the selected orders, estimated parameters and 
future observations. In addition, a simulation study is conducted to 
illustrate the practical implications of AIC. This study shows that 
AIC also yields a satisfactory same-realization prediction in finite 
samples. On the other hand, a limitation of AIC in same-realization 
settings is pointed out. It is interesting to note that this limitation 
of AIC does not exist for corresponding independent cases. 

1. Introduction. To select a model for the realization of a stationary 
time series, it is common to assume that the realization comes from an au- 
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toregressive moving- average (ARMA) process whose AR and MA orders are 
known to lie within prescribed finite intervals. Then a model selection proce- 
dure is used to select orders within these intervals and thereby determine a 
model for the data. However, as pointed out by Burnham and Anderson [9], 
it is not common for the true model to be a function of a small number of 
unknown parameters, and a model having many parameters is sometimes 
essential to obtain a better approximation of the true model. From this 
perspective, a more flexible alternative to the ARMA assumption is the as- 
sumption that data are generated by an AR(oo) process. In this situation, 
the focus of model selection is usually placed on the forecasting ability of 
the chosen model, and not on the correctness of the selection. 

Shibata [27] gave the first justification for several model selection criteria 
along this line. He considered the problem of choosing a finite-order AR 
model, allowing the order to become infinite as the number of observations 
does. He showed that for the purpose of forecasting the future of an inde- 
pendent replicate, which is referred to as independent-realization prediction 
[see (1.4)], Akaike's information criterion (AIC) [2], the final prediction error 
(FPE) method [1] and S n (k) [27] are asymptotically efficient in the sense 
that no other selection criterion achieves a smaller limiting mean square pre- 
diction error criterion value. (Since this is an asymptotic result, the name 
AIC could also be thought of as an acronym for "Asymptotic Information 
Criterion.") Based on a similar analysis, Bhansali [5] extended Shibata's 
result to the case of multistep predictions. However, Shibata's concept of 
asymptotic efficiency, which focuses on independent-realization predictions, 
is not a natural property for time series analysis, because when new ob- 
servations of a time series become available, they are usually dependent on 
the previous data. So far, no time series model selection theory has been 
established without this unnatural assumption. This motivated our study. 

To begin with, let us assume that observations xi,...,x n come from a 
stationary AR(oo) process {x{\ with 

oo 

(1.1) xt + ^2a,iXt-i = et, t = ..., -1,0,1,..., 

i=l 

where et is a sequence of independent random noise values with zero mean 
and variance a 2 , and the coefficients ai are absolutely summable. For predict- 
ing x n+ h, h > 1, we consider the finite-order approximation models AR(1), . . . , 
AR(K n ). Here, we allow the maximal order, K n , to increase to infinity with 
n in order to reduce approximation errors. The prediction for x n+ f L is referred 
to as same-realization prediction. For brevity, our theoretical discussion only 
focuses on the one-step prediction case, h = 1. But the related extensions 
to cases h > 1 are straightforward as discussed in Section 6. When model 
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AR(fc), 1 < k < K n , is adopted, we use a n (k) to estimate the model's coef- 
ficient vector and use 

(1.2) x n+1 (k) = -x / n (fe)a n (fc) 

to predict x n +i, where Xj(/c) = (xj, . . . , Xj_k+i)' and 
a n (fc) = (a ljn (/c), . . .,a k>n (k))' 

satisfies 

2 n— 1 

(1.3) - R n {k)s. n {k) = — x-j(k) x j+h 

j=K n 

with N = n — K n and 

^4 e x #) x ;( fc )' 

Since the difference between k n (k) and the least squares estimator a^(/c), 
where 

/n-l \ -!n-l 

a'(fc) = - E x i( /c ) x i( fc ) E^K+i- 
\j=k / i=fc 

is asymptotically negligible under the assumptions on K n and xt we use 
herein (see Section 2), x n+ \{k) is still called the least squares predictor. 
For assessing the model's predictive ability, we consider the second-order 
(unconditional) mean-squared prediction error (MSPE), l n (k), of x n +i(fc), 
where 

l n (k) = E(x n+ i - x n+l (k)) 2 - a 2 . 

In Section 2 some asymptotic properties of l n {k) from a companion pa- 
per [17] are introduced. In particular, Proposition 2 of Section 2 shows that 
l n (k) can be uniformly (in k) approximated by L n (k) = (k/N)a 2 + ||a — 
a(A;)|||j, where a(fc) is defined after (2.3), a= (01,02,...)' i s an infinite- 
dimensional vector with Oj's defined in (1.1), and ||a — a(fc)|||j is defined 
after (2.6). The first term of L n {k), (k/N)a 2 , which is proportional to the 
order of the candidate model, k, can be viewed as a measure of model 
complexity. The second term of L n (k), ||a — a(/c)||^, which decreases as 
k increases, measures the goodness of fit. Proposition 3 of Section 2 fur- 
ther points out that L n (k) can also be used to uniformly approximate 
ln,o{k) = E(y n+ i — y n+ i{k)) 2 — a 2 , the second-order unconditional MSPE for 
independent-realization predictions. Here {y±, . . . ,y n } is a realization from 
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an independent copy of {xt}, y n +i is the future observation to be predicted, 
and the predictor y n+ \(k) is given by 

(1-4) y n+l {k) = -y' n {k)k n {k) 1 

with k n (k) [see (1.3)] obtained from xi,...,x n and y' n (k) = (y n , . . .,y n+1 _ k ). 
Therefore, these two types of MSPEs are asymptotically equivalent. To us 
this equivalence is somewhat surprising because some recent studies have 
shown that this equivalence does not hold in other situations; see the dis- 
cussion after Proposition 3 for details. It can be erroneous to directly assume 
that the results from same-realization predictions will be the same as those 
for corresponding independent cases without theoretical justification. 

When the order of the least squares predictor is selected by an order 
selection criterion, due to more complicated probabilistic structures, ana- 
lyzing the predictor's MSPE becomes more difficult. Section 3 is devoted 
to this problem. For independent-realization predictions, Theorem 1 of Sec- 
tion 3 provides an asymptotic expression for the second-order unconditional 
MSPE of y n+l 

(^n)j 

(1-5) ln,o(k n ) = E(y n+ i - y n+1 (k n )) 2 - a 2 , 

where 1 < k n = k n (xi, . . . ,x n ) < K n is an order determined by AIC, FPE, 
C p [22], S p [14] or S n (k). The reason why C p and S p are included in our 
analysis is given in Remark 2 of Section 3. We are interested in the other 
criteria because their asymptotic optimalities for independent-realization 
predictions were justified by Shibata [27] through a conditional version of 
l n ,o(k n ), namely, 

E{(y n+1 - y n+ i(k n )) \x\, . . . ,x n } - a ; 

see (3.1) and (4.11) for more details. However, since this paper focuses on the 
unconditional MSPE, an extension of Shibata's result to the unconditional 
case is needed. It should be noted that this extension is nontrivial since there 
are several technical gaps to be bridged, as detailed in Section 5. Accord- 
ing to Theorem 1, H. n fi{k n ) with k n selected by these criteria can ultimately 
achieve the best compromise between model complexity and goodness of fit, 
provided {xt} is truly an infinite-order AR process. Viewing this result, it 
is interesting to ask whether AIC [C p , S p , FPE or S n (k)] still possesses a 
similar property for same-realization predictions. The main difficulty of this 
question lies in the fact that the selected orders, estimated parameters and 
future observations are all stochastically dependent in the same-realization 
case. Since, as observed in (1.5), the future observations are independent of 
the estimated parameters and the selected order for independent-realization 
predictions, the approaches used in [27] and Theorem 1 are no longer applica- 
ble. To overcome this difficulty, an assumption for L n (k), assumption (K.6), 
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is introduced in this section. Two examples are given to illustrate that as- 
sumption (K.6) is easily met in common practice. Based on this assumption 
(among others), Theorem 2 (also in Section 3) shows that 

E(x n+ i - x n+ i(k n )) 2 - a 2 

and 

E{y n +i ~ y n +i{K)) 2 ~ cr 2 , 

with k n selected by AIC [C p , S p , FPE or S n (k)], have the same asymptotic 
expressions. Moreover, we also apply the same techniques to analyze some 
other AlC-like criteria having different penalty functions; see Corollary 1 
and Remark 3 of Section 3. Armed with Corollary 1, the performances of 
these criteria are first evaluated from the same-realization prediction point 
of view. 

In Section 4 the results obtained in Section 3 are re-examined in greater 
depth. In particular, we show that, for same-realization predictions, the pre- 
dictor with an order determined by AIC [FPE, S p , C p or S n (k)] is ulti- 
mately no worse than the best predictor among the candidate predictors, 
{x n+ i(l), . . . ,x n+ i(K n )} . This property is referred to as asymptotic effi- 
ciency; see (4.1). To the authors' knowledge, this is the first result that con- 
firms AIC's (and its variants') validity in same-realization settings. In addi- 
tion, a simulation study is conducted to illustrate the practical implications 
of AIC. This study shows that AIC also yields a satisfactory same-realization 
prediction in many finite-sample situations; see Table 1 in Section 4 for more 
details. On the other hand, a limitation of AIC in same-realization settings 
is demonstrated. Empirical results, given in Table 2 in Section 4, reveal 
that it seems very difficult for AIC to possess strong asymptotic efficiency; 
see (4.5) for the definition. This is a somewhat interesting discovery because 
we show at the end of Section 4 that AIC has no such difficulty when it is 
used for independent-realization predictions. It is worth noting that AIC's 
asymptotic efficiency is established under the assumption that the under- 
lying process is truly an AR(oo) process. If the order of the true model is 
finite, then the BIC-like criterion, for example, BIC [24] and HQ [13], can 
choose the smallest true model with probability tending to 1, but AIC does 
not possess this optimal property (see [26]). Therefore, to achieve optimal 
same-realization predictions in situations where the underlying AR model 
has a possibly finite order, further investigation is still required. For ease 
of reading, all proofs of the results in Section 3 are deferred to Section 5. 
Concluding remarks are given in Section 6. Discussions of moment restric- 
tions, connections between time series and regression model selections, and 
extensions to the multivariate case are also given in this section. 
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2. Preliminary results. We first list several assumptions essential to the 
following analysis. 

(K.l) Let {xt} be a linear process satisfying (1.1) with A(z) = 1 + a±z + 

a,2Z 2 H for \z\ < 1. Furthermore, let the coefficients {a{\ obey 

one of the following restrictions: (a) X^il a «l < 00 > (b) J2i=i\i a i\ < 
oo, or (c) X^iK a «l < °°- 

(K.2) Let the distribution function of e$ be denoted by F t . Some positive 
numbers a, 5 and Co exist such that, for all £ = ...,—1,0,1,... and 
\x-y\ <6, 

\F t (x)-F t (y)\<C \x-y\ a . 

(K.3) sup_ 00<t<00 ^|e t | s < oo, s = 1, 2, . . . . 
(K.4) Let K n be chosen to satisfy 

Q < < c u 

n 

for some positive numbers S±, Ci and C u . 
(K.5) a n for infinitely many n. 

Remark 1. (K.l) (a) implies that xt has a one-sided infinite moving- 
average representation ([30], page 245), 

oo 

(2.1) x t = Y J b i e t -i, 

i=0 

where the bi are absolutely summable with bo = 1, and the polynomial 
B(z) = A~ 1 (z) = 1 + b\z + b2Z 2 + • • • is bounded away from zero for \z\ < 1. 
Therefore, the spectral density function, /(A), of {xt} satisfies f\ < /(A) < /2 
for some < f\ < ji < oo, where — n < A < n. This property also ensures that 
sup fc>1 ||i?(A;)|| < oo and sup fc>1 ||i? _1 (/c)|| < oo, where 

(2.2) R(k) = E(* n (k)*! n {k)) 

and ||^4|| 2 = A max (^4'^4) denotes the maximal eigenvalue of the matrix A' A. 
Moreover, according to Brillinger ([8], Theorem 3.8.4), (K.l)(b) and (K.l)(c) 
imply that X^iN 1//2 &i| < 00 an d X^iN&il < 00 > respectively. 



The MSPE of x n+ i(k) can be expressed as 
:.3) E 
where 1 < k < K„ 



(2.3) E(x n+1 - x n+l {k)f -a 2 = E(f(k) + S(k)) 2 , 



-. n—l 

i{k) = x! n (k)R-\k)- J2 Mk)e j+ i, k , 

j = Kn 



e j+lt k = x j+ i + x.j(k)a(k), 
a(k) = (ai(fc),...,a fc (fc)) / 
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is the minimizer of mfc(c) = E(xk+i + x^,(/c)c) 2 , c 6 R k , and 

CO 

i=l 

with cii(fc) = for i > k. To simplify the notation, a(fc) is sometimes viewed 

as an infinite-dimensional vector with entries a.i(k), i = 1,2, 

To find an asymptotic expression for Z n (fc) = E(x n+ \ — x n+ \{k)) 2 — a 2 , 
Proposition 1 below deals with the moment properties of R~ l {k), defined 
after (1.3). Its proof can be found in [17] [see equations (2.27) and (2.28) 
and Theorem 2]. For the sake of convenience, in the rest of this paper we 
use C to denote a generic positive constant independent of the sample size 
n and of any index with an upper (or lower) limit depending on n. But C 
may depend on the distributional properties of xt- It also may have different 
values in different places. 

Proposition 1. Assume (K.l)(a), (K.2), (K.3) and (K.4). Then, for 
any q > 0, 

(2.4) max E\\R- l (k)\\ q < C 

V ' l<k<K n " n V ~ 

and 

(2.5) max m^ -n-wr <c 

v ; l<k<K n (k 2 /N)i/ 2 

hold for all sufficiently large n, where R~ 1 (k) denotes the inverse of R{k) 
[see (2.2)]. 

Armed with Proposition 1, Ing and Wei ([17], Theorem 3) obtained an 
asymptotic expression for l n (k) which holds uniformly for all 1 < k < K n . 
This result is summarized in the following proposition. 

Proposition 2. Assume (K.l)(b), (K.2), (K.3) and (K.4). Then 

E(x n+1 - x n+1 (k)) 2 - a 2 _ 



(2.6) lim max 

rwool<fc<K 



0. 



L n (k) 

where L n (k) = (k/N)a 2 + ||a — a(fc)|||j, a = (ai, 02, ■•■)', &(k) is now viewed 
as an infinite- dimensional vector, and for an infinite- dimensional vector d = 
(d 1 ,d 2 , ■■■)', 

\\ d \\ 2 R= kdjU-j, 

i<i ,j<oo 

with ji-j = E(xiXj). We also note that ||a — a(A;)|||j = E(S 2 (k)) decreases 
as k increases. 
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The following result provides an asymptotic expression for the MSPE of 
the least squares predictor, y n+ i(k), in independent-realization settings. 

Proposition 3. Assume that the assumptions of Proposition 2 hold. 
Then 

E{(y n+1 - y n+1 (k)) 2 } - a 2 



(2.7) lim max 

n^oo±<k<K. 



L n {k) 



1 



0. 



A proof of Proposition 3 can be found in Theorem 4 of [17]. Viewing 
(2.6) and (2.7), both types of second-order MSPEs can be uniformly ap- 
proximated by the same function, L n (k), and, hence, they are asymptoti- 
cally equivalent. However, this equivalence should not be taken for granted. 
To see this, Ing [15] recently showed that, if the underlying process is a 
random walk model and the assumed model is correctly specified, then 

r E{x n+1 - x n+1 (l)) 2 - a 2 ^ 2_ 

™E(y n+1 -y n+1 (l))*-(T* 13.2859' 

Therefore, the equivalence mentioned above does not hold in this example. 
For stationary AR processes, Kunitomo and Yamamoto ([20], pages 946-947) 
also considered a comparison between same- and independent-realization 
MSPEs. They showed that the difference between the terms of order 1/n of 
the two types of MSPEs can be substantial when a fixed-order and under- 
specified AR model is used. [Note that their conclusion does not contradict 
that obtained from (2.6) and (2.7), because the second-order MSPE is of 
order O(l) in the under specified and fixed-order case.] These comparisons 
show that the difference between the MSPEs in two types of forecasting 
settings should be carefully examined in each different situation. It can be 
erroneous to directly assume that the results for same-realization predictions 
will be the same as those for the corresponding independent case without 
theoretical justification. 

Due to more complicated probabilistic structures, this analysis becomes 
more difficult when the order of the predictor is selected by a data-driven 
method. This situation is considered in the following section. 

3. Asymptotic expressions for the MSPEs of AIC and its variants. Val- 
ues of S n , AIC, FPE, S p and C p for an AR(&) model are defined by 

S n (k) = (N + 2k)al 

2k 



AlC(A0 = log^ + 



n 



k 



S p( fc ) = l 1 + ]v-iferi)^ 



2 
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and 

C p (k) = Na 2 k -(N-2k)a 2 Kn , 

respectively, where 

71—1 

°k = ~T~t H + a i,n(fe)»t H H a fe)n (A;)x t+ i_ fe ) 2 



N t=i 



and 



Also define 



and 



* 2 f ^ ).3 



fc„ = argniinS^ft), 

l<jfc<.K„ 

fc^ = argmin AlC(fc), 

Kk<K n 



k^ = argmin FPE(fc), 

l<k<K n 

k^f = argmin S p (k) 
l<k<K n 



k^ = argmin C p (k). 

Kk<K n 



For independent-realization predictions, Shibata ([27], Section 4), assum- 
ing (K.l)(a), (K.5), K n = o{n 1 / 2 ) and Gaussian noise, showed that 



(3.1) 



E{(y n+1 - y n+1 (k n )) 2 \xi, . . . , x n } - a 1 



L n {kn) 



o P (l), 



where 1 < k n < K n equals k^, k^ or and &;* is defined implicitly through 
Ln(kn) = m ^ n i<k<K n L n (k). (Note that k* n — ► oo, provided K n — ► oo and 
(K.5) holds; see [27], page 154.) However, since, as mentioned in the first 
section, this paper focuses on the unconditional MSPE, an unconditional 
version of (3.1) is given in the following theorem. 

Theorem 1. Let the assumptions of Proposition 2 and (K.5) hold. Then 

,. E(y n+1 - y n+1 (k n )) 2 - a 2 
urn ; — = 1, 

n^oo L n {k* n ) 
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Remark 2. It is unclear from Shibata's [27] paper whether (3.1) holds 
with k n = fe„ or k n p . In fact, both C p s and S p 's predictive abilities in AR(oo) 
models have seldom been discussed in the literature. On the other hand, C p 
and AIC have been proven to be asymptotically equivalent in the regres- 
sion model with infinitely many parameters; see, for example, [28] and [25]. 
Under a similar situation, Breiman and Freedman [7] also established S p 's 
asymptotic optimality for prediction (see Section 6 for more details). These 
previous results motivated us to include C p and S p in the analysis. 

Theorem 1 shows that, for independent-realization settings, the second- 
order (unconditional) MSPE of the least squares predictor with the order se- 
lected by AIC, S n (k), FPE, S p or C p can ultimately achieve the best compro- 
mise between model complexity, (k/N)a 2 , and goodness of fit, ||a— a(fc)||^. 
This result led us to ask whether AIC still possesses a similar property for 
same-realization predictions. Since the model selection criteria, estimated 
parameters and future observations are all stochastically dependent in same- 
realization settings, we impose the following assumption on L n (k) in order 
to simplify the dependent structures among these components. 

(K.6) For any £ > 0, there is an exponent 9 = with < 9 < 1 such that, 
for all large n and all k £ A n Q = {k : 1 < k < K n , \k — > k* n }, 

(3 . 2) K ' &^S1 > >0, 

\K K n \ 

where K n satisfies (K.4) and C is some positive constant independent 
of n. 

Note that if {x{\ is a stationary AR model of finite order, then (3.2) holds 
automatically. When {x{\ is truly a stationary AR(oo) model, the following 
two examples also show that (3.2) is flexible enough to accommodate a 
variety of applications. 

Example 1 (Exponential-decay case). Assume that, for all k = 0, 1, ... , 
the AR coefficients satisfy 

(3.3) ak-^e-fr < ^ < c 2 k ei e- pk , 

i>k 

where 9\ is some nonnegative number, and /3, C2 and C\ are some positive 
numbers with C2 > C\. Note that (3.3) is satisfied by any causal and in- 
vertible ARMA(p, q) model with q > 0. It is shown in the Appendix that 
(3.3) and (K.4) yield 

k* n = ^logN + 0(log 2 N), 
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where log denotes the natural logarithm and log 2 N = log (log N). This result 
and (3.3) further ensure that, for any < rj < 1 and all \k — fe*| > , 

(3 .4) jWzM<» SCi| 

holds for all sufficiently large n and some positive number C3 independent 
of n. Hence, for any £ > 0, (3.2) is satisfied with any < 6 < 1. For a proof 
of (3.4), see the Appendix. 

Example 2 (Algebraic-decay case). Assume that, for all k = 0,1, ... , 

(3.5) (C 4 - Mik~^)k- p < ||a - a(fc)||| < (C 4 + Mik~~^)k~^ , 

where C4, Mi, £1 and /3 are some positive numbers. Note that for independent- 
realization predictions, Shibata ([27], page 162) gave a similar condition, 

(3.6) ||a-a(fc)||| = C 5 fc-^ 

to illustrate that AIC is strictly better than the other criteria that have 
different weights for penalizing the number of regressors in the model. How- 
ever, since (3.6) imposes a rather restrictive limitation on ||a — a(/c)||^, we 
use (3.5) to replace it. Under (K.4) and (3.5) with £1 > 2 and > 1 + 5i 
[note that 8\, defined in (K.4), can be an arbitrarily small positive number], 
we show in the Appendix that 

/ a 2 \-V(/3+i) 

K=lT77n;) +C(1), 



and for some positive number Cq and all | k — kt I > C% 



r , 7 ^ N(L n (k)-L n (k*)) 

{ > \k~^\ 7 

fx fx/,** 



k fc* 



A 1 



holds for all sufficiently large n and some positive number C-j independent 
of n, where, for real numbers a and b, a Ab = a if a <b and a Ab = b if a> b. 
Therefore, for any £ > 0, (3.2) is satisfied with any 1 — min{£, 1} < 9 < 1. 

The above discussion shows that assumption (K.6) is quite natural from 
both practical and theoretical points of view, since it includes the ARMA 
models (which are the most used short-memory time series models in prac- 
tice) and the AR models with algebraic-decay coefficients (which are of much 
theoretical interest in the context of model selection) as special cases. Tech- 
nically speaking, (3.2) gives L n (k) a basin-like shape such that, for k dis- 
tant from the bottom (falling into A n fi), the probability of {k n = k}, with 

k n = k^,kn i ' or ) ^ s "sufficiently" small (see the proof of Theorem 2 
for more details). Now the main result of this section is stated as follows. 
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Theorem 2. Let the assumptions of Theorem 1 and (K.6) hold. Then 
for same-realization predictions, one has 

(3.8) lim E(x n+l -x n ^(k n )f-^ = 

n^oo L n {k* n ) 

where kn = k n , k n , kS? > kn or k^. 



In the literature the penalty for the number of regressors in the model 
sometimes has a weight different from that used in AIC. Following [3], we 
now consider AlC a (k), where a > 1, 



-2 °k 



AlCa(fc) =loga Z k + 



L^(k)= {a -^ )ka +||a-a(fe)|||, 
kn a = argminAIC a (/c) 

l<k<K n 



and 



k^ ) = argminL^(/c). 

i<fc</< n 

To investigate the performances of AIC Q (/c), q > 1, for same-realization pre- 
dictions, we need the following analogy of (K.6). 

(K.6') For any £ > 0, there exists an exponent 9 = #(£) with < 6 < 1 such 
that, for all large n and all k G A^l = {k : 1 < k < i^ n , | k — /c* (Q> | > 



(3.9) (C )« - ^ C > °' 

where a > 1, X n satisfies (K.4) and C is some positive number inde- 
pendent of n. 

For any a > 1, it is easy to see that (3.9) is fulfilled by finite-order station- 
ary AR models. By arguments similar to those given in the Appendix, we can 
also show that, for any a > 1, (3.9) is satisfied by stationary AR(oo) mod- 
els with coefficients which obey (3.3) or (3.5) (with > 1 + Si and £i > 2). 
Therefore, assumption (K.6'), like assumption (K.6), is reasonable for a wide 
range of applications. In Corollary 1 we obtain an asymptotic expression for 
the MSPE of x n +\(kn a ), a > 1, in same-realization settings. 
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Corollary 1. Let the assumptions of Theorem 2 hold with (K.6) re- 
placed by (K.6'). Then 



E(x n+1 - x n+1 ( k£ a )) 2 - a 2 



(3-10) ' + r TX) U - = L 



Remark 3. By arguments like those used to prove Corollary 1, (3.10) still 
holds with kn a replaced by k^ a or k^ , where a > 1 , 

r F _ _^ __ / , \ ( Oik \ A o 

K n " = argminFPE Q (A;) = argminl 1 H \o~ k 

Kk<K n Kk<K n \ n J 



and 



k\°^ = avgmin (k) = argmin(A^ + ak)a 2 . 

Kk<K n Kk<K n 



Therefore, AIC a (A;), FPE a (A;) and S^\k) are (asymptotically) equally ef- 
ficient for the same choice of a. Note that FPEq,(/c) and S^i^k) were first 
introduced by Bhansali and Downham [6] and Shibata [27], respectively. 

To illustrate Corollary 1, we first consider a special case of (3.3), 
(3.11) C ie - pk < J>, 2 < C 2 e- pk , 

i>k 

where (3,C\ and C 2 are some positive numbers with C 2 > C\. Condition 

(3.11) is satisfied by any causal and invertible ARMA(p, q) process with 
q > 0. Under (3.11), it can be shown that, for any a > 1, 

Lnjkn ) _ j 

This fact and (3.10) yield that, for any two positive numbers a\ and a 2 larger 
than 1, AIC Ql (/c) and AIC Q2 (fe) are asymptotically equivalent, namely, 

(3.12) lim E{x n+1 - Xn+1 {kipf-a 2 = ^ 
™ E{x n+1 -x n+1 {kn ai )) 2 -a 2 

Next consider the algebraic-decay case (3.5) with £i > 2 and (5 > 1 + 5\. 
By arguments similar to those used for obtaining (3.7) and Case I of [27], 
page 162, one has, for 1 < a 2 < ot\ < 2 or 2 < a\ < a 2 < oo, 

hminf ^^" , j > 1 



L n {kn ) 
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and, hence, for 1 < a<i < a.\ < 2 or 2 < ci\ < 02 < 00, 

(3.13) liminf E(a;w+1 - £ " +l( ^ a))2 " g2 >l. 

™ E(x n+1 -x n+1 (kn ai )) 2 -a* 

Inequality (3.13) and Corollary 1 together imply that AIC asymptotically 
dominates AIC Q , a^2, in the sense that 

(3.14) ^. ^(Xn+r-W^)) 2 -^ > 

™ E(x n+1 -x n+1 (k£)y-a2 ~ 

with strict inequality holding for at least the algebraic-decay case (3.5). 

Before leaving this section, we note that (3.12)-(3.14) seem to be the first 
results that can evaluate (compare) the performances of AIC Q (A;), a > 1, 
from the same-realization prediction point of view. For more applications 
of these results, see Section 4. In addition to (3.5), we have also found a 
similar case, ||a — a(fc)||^ = Cs(log k) e ' i k~ 13 , with C5 > 0, —00 < #3 < 00, and 
(3 > l + 8\, where (3.14) holds with strict inequality only. However, to gain a 
deeper understanding of AIC it would be interesting to identify more AR(oo) 
models which can lead to the same property. 

4. Performances of AIC and its variants for independent- and same- 
realization predictions. Based on the results obtained in Section 3, this sec- 
tion aims to investigate how well AIC (or its variants) works for independent- 
and same-realization predictions. Let k n E {1, 2, . . . , K n } be determined by a 
certain order selection criterion with K n satisfying assumption (K.4). Define 

, ,f » _ EjXn+l ~ X n+1 (k n )) 2 ~ O 2 

mmi< k < Kn E{x n+1 - x n+ i[k)) z - a z 
We say that k n is asymptotically efficient for same-realization predictions if 
(4.1) limsupPE(x n+1 (k n ))<l. 

n—*co 

Similarly, for independent-realization predictions, define PEI(y n+ i(k n )) as 

E(y n+ i - y n+ i(k n )) 2 - a 2 



PEI{y n+1 (k n )) 



mmx< k < Kn E(y n+ i - y n +i{k)) 2 - o 2 ' 



We say that k n is asymptotically efficient for independent-realization pre- 
dictions if 

(4.2) limsxipPEI(y n+1 (k n ))<l. 

n— >oo 

Inequality (4.1) [(4.2)] says that, if k n is determined by an asymptoti- 
cally efficient criterion, then the relative prediction efficiency of the best 
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predictor (from the MSPE point of view) among {x n +\(l), . . . ,x n +i(K n )} 
[{y n+ i(l),. ..,y n+ ±(K n )}] of x n+ i(k n ) [y n+1 (k n )], that is, PE(x n+1 (k n )) 
(PEI(y n+ i(k n ))), will ultimately not exceed 1. 
Proposition 3 and Theorem 1 yield that 

(4.3) lim PEI(y n+1 (k n )) = 1, 

n — »oo 

n p or ■ Therefore, AIC, FPE, S n (k), S p and C p 
are all asymptotically efficient for independent-realization predictions. Ac- 
cording to Proposition 2 and Theorem 2, we have 

(4.4) lim PE(x n+1 (k n )) = l, 

n — >oo 

where k n = k^,k^ ,k^, kn p or k%, which shows that these criteria are also 
asymptotically efficient for same-realization predictions. In addition, if it is 
already known that the exponential-decay case (3.11) holds, then (3.12) and 

Remark 3 suggest that more options, for example, AIC a [FPE Q , Sn (k)], 
with any a > 1, are available for achieving asymptotic efficiency. However, 
AIC and its variants cannot be replaced by AIC a [FPE Q , S^\k)], a ^ 
2, in general, since (3.13) and (4.4) imply that the latter criterion is not 
asymptotically efficient in the algebraic-decay case. 

To gain further insight into the practical implications of asymptotically 
efficient criteria for same-realization predictions, a simulation study is con- 
ducted. Let observations be generated from an ARMA(1, 1) model 

xt+i = <j>ax t + e t - 9 e t -i, 

where the e^s are independent and identically AA(0, 1) distributed, 4>$ = 
±0.9, ±0.7, ±0.5 and 6q = ±0.8, ±0.6. For each combination {<fio,8o)-, the em- 
pirical estimates of PE(x n+ i(k^)), denoted by PE{x n+ \{k^))^ are obtained 
based on 20,000 replications for (n, K n ) = (60, 7), (120, 10), (200, 14), (500, 22) 
and (1000,31). (Note that K n here is set to the largest integer < n 1 / 2 .) In 
addition, the empirical estimates of 

/ K \_ "hni<fc<if n E(x n+ i - x n+1 (k)) 2 - a 2 
mmi< fc < 6 E(x 61 - x 61 (k)) z - a z 

denoted by % pt (n,K n ), with a 2 = 1 and (n,K n ) = (60, 7), (120, 10), (200, 14), 
(500,22) and (1000,31), are also obtained based on the same ARMA(1,1) 
model and 20,000 replications. [Note that 7 op t(^>-K'n) is used to illustrate 
how fast mini<jfc<# n E(x n+ \ — x n+ \{k)) 2 — a 2 decreases as n and K n si- 
multaneously increase.] According to the rate of convergence of % p t(n, K n ), 
these empirical results (which are summarized in Table 1) can be classified 
into three categories. [Since 7 opt (60, 7) = 1, 7 op t(60,7) is set to 1 in Table 1.] 
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We first observe that the fast rate of decrease of % p t(n,K n ) clearly oc- 
curs when sgn(^o) #sgn(0 o ), (<f>oM = (0.9,0.6), or (0 O ,0 O ) = (-0.9,-0.6), 
where, for a nonzero real number a, sgn(a) = 1 if a > and sgn(a) = — 1 
if a < 0. In these cases, we also observe that the rate of convergence of 
PE(x n +i(kn)) is very slow and the values fluctuate around a certain number 
which is not distant from 1. In particular, if 6q = ±0.8, the values fluctuate 
around 1.25, while, if 9q = ±0.6, the values fluctuate around 1.5 (or slightly 
higher). The second category contains those parameter combinations satis- 
fying | <fio — 6q I = 0.1. The decreasing rate of % p t(n, K n ) in this category is 
obviously slower than that in the first category. On the other hand, except 
in the cases where (4>o,0o) = (0.9,0.8) and (—0.9,-0.8), the advantage of 
increasing n and K n in reducing the values of PE{x n+ \{k^)) becomes quite 
significant and the ratio Pi?(£iooi(^iooo))/^^(^6i(^6o)) * s smaller than 2/3. 
When (0o, O ) = (0.9,0.8) and (-0.9, -0.8), the values of PE(x n+1 (k£)) are 
smaller than those in the other cases in this category. However, the reduction 
in the values of PE(x n ^i(k^)) is also much smaller (only a slightly decreas- 
ing trend can be observed). Another observation regarding this category is 
that, as (n, K n ) increases to (1000, 31), PE{x n+ \(k^)) decreases to a value 
around 1.5 if 6q — ±0.8, and decreases to a value around 1.95 if #o — ±0.6. 
The third category contains the remaining parameter combinations, namely, 
(00, ^o) = (0.5,0.8) and (—0.5,-0.8). The rate of convergence of 7 op t(w, K n ) 
in this category is intermediate (slower than in the first category, but faster 
than in the second category), while the rate for PE(x n ^-i(k^)) is slow and 
its value fluctuates around 1.5. In summary, although the convergence rate 
of PE(x n +i(kn)) is slow when \(f>o — 9q\ > 0.3 (which includes the first and 
the third categories), the relatively small values of PE(x n+ i(k^)), accompa- 
nied by a (very) fast rate of decrease of % p t(n, K n ), show that, even in finite 
sample situations, AIC also yields a satisfactory same-realization prediction 
for AR(oo) models, provided the number of candidate models is allowed to 
increase with the sample size. On the other hand, when |0o — #o| = 0.1, the 
(prediction) efficiency of AIC is not satisfactory (except in the cases where 
\(f>o\ = 0.9) and the reduction in the values of % pt (n, K n ) through increasing 
n and K n also becomes relatively insignificant. However, the efficiency of 
AIC can be substantially improved (except in the cases where \4>q\ = 0.9) if 
n and K n are allowed to increase simultaneously. 

In the rest of this section, the question of whether AIC is asymptotically 
optimal among all order selection criteria from the MSPE point of view is 
investigated. This question led us to define a stronger version of asymptotic 
efficiency. An order selection criterion kf/^ with 1 < /c^ A < K n is said to be 
strongly asymptotically efficient for same-realization predictions if 



E(x n+l - x n+l (kl A )Y 



a 



2 



(4.5) lim v ZI^XJl LL — r = i, 

™ mf in€jn E(x n+1 - x n+l {I n )f - a 2 
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Table 1 

Empirical estimates of PE{x n +\{k^)) and •y op \,{n,Kn) 



0o 



0.8 0.6 -0.6 -0.8 



4>o 


n/K n 


P 


R 


P 


R 


p 


R 


P 


R 


—0.9 


60/7 


1.23 


1 


1.46 


1 


1.64 


1 


1.68 


1 




120/10 


1.21 


0.56 


1.46 


0.54 


1.58 


0.58 


1.55 


0.71 




200/14 


1.29 


0.36 


1.56 


0.34 


1.67 


0.38 


1.52 


0.54 




500 /22 


1.25 


0.17 


1.56 


0.15 


1.63 


0.17 


1.53 


0.34 




1000/31 


1.29 


0.09 


1.54 


0.08 


1.58 


0.10 


1.46 


0.17 


—0.7 


60/7 


1.23 


1 


1.44 


1 


2.90 


1 


2.49 


1 




120/10 


1.25 


0.57 


1.49 


0.54 


2.48 


0.68 


2.37 


0.70 




200/14 


1.29 


0.37 


1.61 


0.35 


2.25 


0.51 


1.95 


0.59 




500 /22 


1.31 


0.17 


1.53 


0.16 


1.97 


0.27 


1.65 


0.38 




1000/31 


1.28 


0.10 


1.56 


0.09 


1.93 


0.17 


1.62 


0.23 


-0.5 


60/7 


1.23 


1 


1.50 


1 


3.48 


1 


1.49 


1 




120/10 


1.25 


0.57 


1.62 


0.53 


3.01 


0.65 


1.47 


0.67 




200/14 


1.33 


0.36 


1.57 


0.35 


2.78 


0.47 


1.53 


0.46 




500/22 


1.33 


0.17 


1.56 


0.16 


2.29 


0.27 


1.44 


0.21 




1000/31 


1.26 


0.10 


1.56 


0.09 


1.99 


0.17 


1.42 


0.13 


0.5 


60/7 


1.55 


1 


3.10 


1 


1.49 


1 


1.25 


1 




120/10 


1.51 


0.67 


2.98 


0.63 


1.59 


0.55 


1.23 


0.56 




200/14 


1.48 


0.46 


2.86 


0.45 


1.55 


0.38 


1.31 


0.37 




500/22 


1.47 


0.22 


2.45 


0.26 


1.61 


0.16 


1.28 


0.17 




1000/31 


1.41 


0.13 


1.99 


0.16 


1.57 


0.09 


1.32 


0.10 


0.7 


60/7 


2.71 


1 


2.97 


1 


1.55 


1 


1.25 


1 




120/10 


2.31 


0.71 


2.56 


0.62 


1.58 


0.53 


1.25 


0.56 




200/14 


1.92 


0.62 


2.31 


0.48 


1.61 


0.36 


1.29 


0.37 




500/22 


1.79 


0.37 


2.04 


0.27 


1.53 


0.16 


1.28 


0.18 




1000/31 


1.56 


0.24 


1.95 


0.16 


1.44 


0.09 


1.31 


0.10 


0.9 


60/7 


1.75 


1 


1.58 


1 


1.43 


1 


1.24 


1 




120/10 


1.56 


0.70 


1.61 


0.57 


1.50 


0.53 


1.23 


0.57 




200/14 


1.57 


0.51 


1.66 


0.37 


1.58 


0.32 


1.31 


0.37 




500/22 


1.49 


0.29 


1.68 


0.17 


1.54 


0.15 


1.31 


0.17 




1000/31 


1.48 


0.17 


1.57 


0.10 


1.47 


0.08 


1.29 


0.10 



Note. Column P denotes the empirical estimates of PE[x n +i{k^)) and column R denotes 
the empirical estimates of y pt(n,K„). 



and is said to be strongly asymptotically efficient for independent-realization 
predictions if 

(4.6) lim E ^-Vn + i&)?-^ = L 

™ in f L( _ jn E(y n+1 - y n+1 (I n )f - <r 2 
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Here J n in (4.5) and (4.6) is the family of all ^-measurable random vari- 
ables taking values on {1,2, . . . ,K n }, and Q n is the u-algebra generated by 
{x±, . . . ,x n }. Observe that, for any random variable I n G J n , 

E(x n+ i - x n+ i(i n )) 2 = E^^E[(x n+1 - x n+ i(/c)) 2 |xi, . . .,a: n ]/^ n=fc j| 



>E\ min E[(x n+ i -z n+ i(fc)) 2 |a;i,...,a; n ] \, 



where Irf =k y = 1 if I n = k and 1^ =k y = if I n ^k. Also notice that the 
minimizer of 

E[(x n +i - x n+ i(k)) 2 \xi,...,x n ], k = l,2,...,K n , 
is a member of J n . Therefore 

inf E{x n+ i- x n+ i(i n )) 2 = E{ min E[(x n+ i - x n+ i(k)) 2 \xi, . . . , x n ] \ 
In£jn {i<k<K n J 

and, hence, (4.5) can be rewritten as 

(47) lim E(x n+1 - x n+1 (k* A )) 2 - a 2 = L 

n^oo E{vaini< k < Kn E[{x n+ i - x n+ i(k)) 2 \xi, . . . ,x n ] - a 2 } 

Similarly, (4.6) can be rewritten as 

(48) lim E(y n+l -y n+l (kl A )) 2 -a 2 = ^ 

n->co E{xmn 1 < k <K n E[(y n+ i - y n+ i{k)) 2 \xi, . . . ,x n ] - a 2 } 

To examine whether AIC satisfies (4.5) [or (4.7)], it suffices to check 
whether 

A\\2 „2 



E(x n+ i - x n+ i(k~)) 2 - a 



E{min 1 < A .< A ' n £'[(3; n+ i - x n+ i(k)) 2 \xi, ...,x n ]- a 2 } 

Table 2 
Simulation results for r„ 

6>o 



n 


K n 


0.8 


0.6 


-0.6 


-0.8 


60 


7 


5.61 


6.04 


6.48 


5.60 


120 


10 


7.44 


7.82 


7.97 


8.16 


200 


11 


9.68 


9.17 


9.66 


9.17 


500 


22 


13.10 


12.30 


12.48 


13.54 


1000 


31 


16.18 


13.56 


13.84 


15.72 
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converges to 1 . Instead of evaluating r* theoretically, empirical estimates of 
r*, denoted by f*, with (n, K n ) = (60, 7), (120, 10), (200, 14), (500, 22), (1000, 
31), are obtained based on the MA(1) model 

(4.9) x t = e t - o e t -i 

and 20,000 replications. Here we take the noise {e{\ to be i.i.d. AA(0, 1) 
and use the parameter values 9q = 0.8,0.6, —0.6, —0.8. These estimates are 
summarized in Table 2. It can be seen from this table that when (n, K n ) = 
(60,7), values of f* are larger than 5.5 for all 9q. Moreover, r* increases 
considerably as n and K n grow. In particular, when n and K n increase to 
1000 and 31, respectively, all values of f* are larger than 13.5. Viewing the 
relatively moderate values given in Table 1, Table 2 suggests that it seems 
very difficult for AIC to achieve (4.7). This is a somewhat different situation 
from that encountered in independent-realization settings. To see this, let 
us restrict ourselves to model (4.9) again. Motivated by (4.8), the empirical 
estimates of 



I,n 



E(y n+1 -y n+1 (k£)) 2 -a 2 

= E{y n+l -y n+l (k£)) 2 -o 2 

J B{mini< fc <x n -E[(?/n+i - 2/n+i(k)) 2 ki, . . . , x n ] - cr 2 }' 

denoted by f* In , with (n, K n ) = (60, 7), (120, 10), (200, 14), (500, 22), (1000, 31) 
and #o = 0-8, 0.6, —0.6, —0.8, are obtained based on 20,000 replications (see 
Table 3). Table 3 shows that values of rf n , like values of PE{x n+ i(k^)) in 
Table 1, are not distant from 1, particularly for large n, K n and \9q\. In fact, 
by (5.36) (see Section 5) and Theorem 1, it can be shown that 

(4.10) }™ r in = h 

provided the assumptions of Theorem 1 hold. Therefore, for independent- 
realization predictions, AIC is asymptotically efficient, as well as strongly 
asymptotically efficient. [Note that (4.10) also holds with replaced by 
k%, k%, K p or k%.] Related to (4.10), but fr om a conditional MSPE point 
of view, Shibata [27] showed that, for a Gaussian AR(oo) process, 

( 4 n - ) E{(y n+1 - y n+1 (k n )) 2 \ Xl , . . . ,x n } - a 2 _ 1=Q 

mf ineJn E{(y n+1 -y n+1 (I n )) 2 \xi,...,x n ] - a 2 

holds with k n = k^,k^ or k^ . An order selection criterion k n is said to 
be asymptotically efficient in Shibata's paper if it satisfies (4.11). For an 
equivalent definition of (4.11), see (5.37). 
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5. Proofs. We first introduce two frequently used results, Lemmas 1 and 2. 
The proofs of Lemmas 1 and 2 are similar to those of Lemmas 3 and 4 of [17] , 
respectively. To save space, we skip the details. 

Lemma 1. Assume that (K.l)(a) holds and sup_ 00<t<00 E(\et\ 2q ) < oo 
for some q>2. Let {mj >n }, i = 0, 1,2, be sequences of positive integers sat- 
isfying 777-2, n > Tain > 77T-o,n for all n > 1. Then, for all 1 < k < 77lo n, 



(5.1) 



E 



rri2,n 



x i( k )( e j+hk - e j+ i] 



3=mi } „ 



< Ck q l 2 \ 



R- 



where m n = rri2,n — Wi,n + 1? e j+i,k * s defined after (2.3), ||a — a(fc)||^ is 
defined in Proposition 2, and for a k-dimensional vector v = (y±, . . . ,Vk)' , 

h\? = Ei=l v i- \ Note that ( K - 5 ) holds > then ll a - a (^)ll| > fo r al1 k = 
1,2, In this case (5.1) can be expressed as 

E\\(l/^)£™£ ln *,{k){e j+1>k - e i+1 )H« 



max 

KKmo.: 



<C.] 



Lemma 2. Assume that (K.l)(a) holds and sup_ 00<t<00 E{\et \ q } < oo 
for q>2. Let {m^}, i = 0, 1, 2, and {m n } &e defined as in Lemma 1. ITten 



(5.2) 
arid 



max (k~ q / 2 )E 

l<fc<m(i,„ 



1 



"12, ; 



^ ■x j (k)e j+1 



(5.3) max (fc 2 - ki)~ q l 2 E 

l<fcl<fc 2 <mo,„ 



3=mi, 



m 2 ■ 



im. 



(xj(fe) -Xj(fei))e y+ i 



J=mi,n 



where x.j(k\) in (5.3) is regarded as a k2- dimensional vector with undefined 
entries set to 0. 



Table 3 
Simulation results for , 



e 



n 




0.8 


0.6 


-0.6 


-0.8 


60 


7 


1.54 


2.10 


2.12 


1.56 


120 


10 


1.50 


2.05 


2.12 


1.49 


200 


14 


1.57 


1.93 


1.90 


1.59 


500 


22 


1.47 


1.79 


1.91 


1.44 


1000 


31 


1.42 


1.81 


1.75 


1.41 
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Lemmas 3-6, listed below, are essential tools for verifying Theorems 1 and 2. 
To provide motivation for Lemma 3, we note that, under (K.l)(a) and the 
Gaussian assumption on {et}, Shibata ([27], Lemmas 3.2 and 3.4) showed 
that, for q = 2, 4 and 1 < k < K n , 



(5.4) E 



1 



n-1 



5^ Xj(fc)e i+ i 



j=Ku 



R-^k) 



q/2 



Y,Ci, q V + 0{l/N)k* 



i=l 



where, for a k x k symmetric matrix A and a /c-dimensional vector y, 
WyWa = y'^y> an d the Cj^'s, independent of n and k, are some positive 
numbers. Under (K.l)(c) with the i.i.d. assumption on {et} and the as- 
sumption that E'ldl 16 < oo, Karagrigoriou ([19], Lemma 3.1) also gave a 
result similar to Shibata's. However, since they needed to calculate et's (or 
xj's) 4gth cross moments, extensions of their approaches to large q cases 
are not easy due to heavy computational burdens. For example, in order to 
verify (5.4) with q = 6 through their approaches, one must deal with e^'s 
(or x^s) 24th cross moments. In addition, even if (5.4) holds for large g's, 
the remainder term, 0(1/N)k q , may dominate the main term, Yll^i^i^ 
(which is of order k q l 2 \ in situations where k q ^ 2 /N is large as well. This 
causes another difficulty since bounding the left-hand side of (5.4) by Ck q l 2 
for sufficiently large q and all 1 < k < K n seems indispensable for our analy- 
sis, especially for proving Theorem 2. Under a slightly stronger assumption 
(than Shibata's) on AR coefficients, the difficulties mentioned above are 
resolved by Lemma 3. 



Lemma 3. Assume (K.l)(b), sup_ 00<t<00 E\et\ 2q < oo for some q>2, 
and K n = 0{n l l 2 ). Then 



(5.5) max k~ q l 2 E 

l<k<K n 



n-1 



j=K n 



ka' 



R~ 1 (k) 



<c. 



Proof. First observe that 

n-1 



E< 



AT 1 / 2 ! 



E< 



1 



j=K n 



n-1 



ka z 



R^ik) 



Nk 1 / 



m J2 (^(k)R^(k)^(k)e 2 +1 -ka' 



j=K n 



n-1 l-l 



+ J^TJ2 E E JjWirWxiWej+w 



+i 



--K n +lj=K n 
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<C 



E< 



C.-K. ING AND C.-Z. WEI 

n-1 



j—K n 



+ E\ 



n-l 



Nk 1 / 2 . . 



1 



n-l Z-l 



Nk 1 / 2 



E E x;.(fc)i?- 1 (fc)x i (fe)e J+1 e z+1 



i=X n +l.j=ii: n 

= C{(I) + (II) + (III)}. 

Since {x'j(k)R~ 1 (k)xj(k)(e 2 +1 — a 2 ), Aij+i} is a sequence of martingale 
differences, where -Mj+i is the cr-algebra generated by {e^+i, ej, . . . }, and 



E 



n-l 



j=K n 



< E sup 

K n <m<n— 1 



by Wei ([29], Lemma 2), the assumption that sup_ 00<t<00 £'|ef +1 |' 3 < oo and 
the convexity of x q / 2 , x > 0, 



(I)<C 



■ n-l 



9/2 



Nk 1 / 2 

f 1 x 9 /2 1 n-l 

<^U7r v E ^(^(^(fc)^. 



Simple algebraic manipulations and Remark 1 yield 

E\^(k)R-\k)^(k^ < EW^ik^WR-Hk)^ < CEWxjWf* 



<Ck^k- 1 J2E\x^ l+1 \^. 



i=i 

Since a^-i+i = X)zT=o ^h e j-J+i-ii) by Wei ([29], Lemma 2), one has, for all 
integers j and I, 

/ oo \ i 

(5.7) £|^_ J+1 | 2 *<C7 E^J <C, 

\/i=o / 
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which further implies, for all K n < j <n and all 1 < k < K n , 
(5.8) E^ityR^ik^jik)^ < Ck q . 

As a result, 

kV/ 2 



2.3 



(5.9) 



(I)<C 



N 



holds for all 1< k<K n . 

Observe that, for all 1 < k < 



n— 1 

-^(x^fe^-H^W-fc) 



^Itrli?- 1 ^)^^)-^))}) 9 



t=li=l 



l i=l 7 = 1 



J 

k 3 "/ 2 

Nil 2 ' 



where R^jik),^ 11 ) and r^j denote the components of R~ 1 (k), R n (k) 

and R(k), respectively, the first inequality follows from Minkowski's inequal- 
ity, and the second inequality follows from an inequality given after (2.19) 



of [17] [which shows that, for all 1 < i,j < k < K n , E\f 



(n) _ 

hi r i> 1 



and the fact that £Li Ej=i \RiJ(k)\ < Ck 3 l 2 (see also [21], page 98). Con- 
sequently, we have, for all 1 < k < K n , 



(5.10) 



(II) < C 



Nil 2 ' 



By Wei ([29], Lemma 2), the moment assumption on {et} and some alge- 
braic manipulations, 



(III) < C 



<C< 



1 



Nk 1 / 2 
( 1 



\NkV 2 
( 1 



n-l / l-l \ 2 "| 9/ 2 

£ E ^{k)R- l {k)^{k)e j+1 \ 

_l=K n +l\j=K n / 

3K n f l-l \ 2"|t//2 

J2 E x' j (k)R-\k)x l (k)e j+1 
=K n +l \j=K n / 



+ 



KNk 1 / 2 
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(5.11) 



+ 



x E 

KNk 1 / 2 
x E 



n-l /l-2K n -l \ 2-| g/2 

E E ^(^(^(fcjej+x 

Z=3if„+1\ j=X„ / 



q/2- 



n-l / {-1 > 

J2 E x / i (fc)iT 1 (*)xj(fc)e i+1 

=3X n +l \j=l~2K n / 

= C{(IV) + (V) + (VI)}. 

By the Cauchy-Schwarz inequality, (5.8), Remark 1 and (5.2), we have, for 
all K n + 1 < l<3K n , 



E 



i-i 



E x^(fc)i?- 1 (fe)x z (fc)e J+1 



(5.12) < |£|xJ(A0# -1 (A0xi(Ar)| ff .E 

and for all 3-fT n + 1 < I < n — 1, 
l-l 

(5.13) £ 



i-i 



E x #) e i+i 



2g 



1/2 



lirHAOII 9 



< Ck q K^ 2 . 



E x;.(A ; ) J R- 1 (A:)xKA : )e J+1 

Hence, by the convexity of x q ^ 2 ,x > 0, (5.12) and (5.13), one obtains, for all 
l<k<K n , 

(5.14) Crv)<Cr(*^) and (VI) < C (^) ^ 

In view of (5.9), (5.10), (5.11) and (5.14), this proof is complete if we can 
show that (V) is bounded. Observe that 



E< 



l-2K n -l 
j=K n 



<c 



(5.15) 



E< 



l-2K n -\ 



E ^{k)R^{k)(^{k)-^\k))e j+1 

j=K n 



l-2K n -\ 



+ E 

C{(VII) + (VIII)} 



E ^{k)R-\k)^ {k)e J+l 

j=Kn 



1M 
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where 



( Kn Kn V 

\s=0 s=0 J 



and bj is defined from the infinite MA representation in Remark 1. Since 



xt(fc)-xW(fc) 



b s ei- s ,..., X! b s ei + \-k- 

.s=K n +l s=K n +l 



reasoning as for (5.8) and (5.12), we have, for all 3K n + 1 < I < n — 1, that 



(5.16) 



/ oo \ <j/2 

(VII) < C I A; ^ 6| N q l 2 k q / 2 . 
V j=K„+l J 



It also can be shown that, for all 3K n + 1 < I < n — 1, 
(VIII) < C£< 



n-2K n -i 



(5.17) 



x ir^i^O^ir 1 ^) ]T Xj{k)e j+ i 



<?/2< 



<<7||£( n )(£)||«/ 2 ||irHfc)|| 9 £ 



l-2K n -\ 



J2 Xj{k)e j+ i 

j=K n 



<C{N q ' 2 k q/2 ), 

where R^ n \k) = E(5of l \k)5i.f 1 ^ (k)), the first inequality is guaranteed by the 

independence between 5c^ n \k) and X^=j^™~ lx j(^) e j+i! Wei ([29], Lemma 2) 
and an argument similar to that used for verifying (3.25) and (3.26) of [17]; 
the third inequality follows from (5.2), Remark 1 and the fact that 
maxx^kKKn R^ n \k) < C, which is ensured by J2j^=o \bj\ < 00 ■ [Note that (5.17) 
is valid even under a weaker moment assumption, sup„ 00<t<00 E\e t \ q < oo, q > 
2.] By the convexity of x q ^ 2 ,x > and (5.15)-(5.17), we have, for all 1 < k < 



(5.18) 



oo<c (k x: b 2 +i\ . 

\ j=K n +l J 



Consequently, the desired property follows from (5.18) and the fact that 
nJ^'jLn+ib'j is uniformly bounded. □ 
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Remark 4. Note that when q = 2, Lee and Karagrigoriou [21] intro- 
duced a decomposition for 

2 x q 



E 



1 



n-l 



cr 



which is similar to those given by (5.6), (5.11) and (5.15). By applying 
this decomposition, they established (5.5) for the special case of q = 2 un- 
der (K.l)(c) with {et} being a sequence of i.i.d. random variables, -E|ei| 4 < 
oo and K n = o(n 1 / 2 ). See Lemma 2.3 of [21] for more details. 

As a direct application of Lemma 3, we get the following result. 

Lemma 4. Assume (K.l)(b), (K.5), sup_ 0O<t<0O E\e t \ 2q < oo for some 
q>2 and K n = 0{n 1 / 2 ). Then 



(5.19) lim E\ max 

n->oo \ Kk<K r 



^ n— 1 

X J2 x i( fc ) e i+i 



--K n 



R- 1 (k) 



0. 



Proof. By Lemma 3 and an argument similar to that used for obtain- 
ing (3.5) of [27], (5.19) follows. □ 

Under (K.l)(a) with i.i.d. Gaussian noise, (K.5) and K n = o(n 1 ' 2 ), Shi- 
bata ([27], (3.5)) obtained, for any e > 0, 



(5.20) 



lim P\ max 

ra-s-oo \l<k<K n 



\&n(k) 



\R 



L n {k) 



1 



> e 



0, 



where a n (fc) in (5.20) is viewed as an infinite-dimensional vector (ai jn (fe), 
&2,n(k), ■■ ■)' with ai ;n (k) = for i> k, and P denotes the probability mea- 
sure. This leads to a lower bound theorem for model selection in independent- 
realization settings (see Theorem 3.2 of [27]), which serves as an impor- 
tant vehicle for establishing 5 n (fc)'s (and its variants') asymptotic opti- 
mality in the sense of (4.11). Recently several authors have attempted to 
establish (5.20) in non-Gaussian settings. Among them, Lee and Karagrig- 
oriou [21] attained this goal by imposing (K.5), the assumptions described 
in Remark 4, and for all integer t and all positive integers k, j with k> j, 

(5.21) Ewfj(k) < Ci, 

where C\ > is independent of t, j and k, and (wt,i(k) , wt^ik) , . . . , w tj k(k))' = 
W~t(k) = i?~ 1//2 (fc)xt(fc). Their moment assumption on {et}, -E|ei| 4 < oo (see 
Remark 4), is considerably weaker than others proposed in the literature. 
But, (5.21) does not seem to be needed. This is because wtj(k) is a lin- 
ear combination in {ei,l < t}, and by an argument similar to that used for 
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showing (5.7), (5.21) holds automatically when the other assumptions are 
imposed. In fact, by (5.6), (5.9)-(5.11), (5.14)-(5.17) and an idea of Lee and 
Karagrigoriou ([21], Lemma 2.4), (5.20) can be ensured by a set of weaker as- 
sumptions, (K.l)(b), (K.5), sup„ 00<i<00 -Ele^ 4 < oo and K n = o(n 1//2 ). How- 
ever, to obtain asymptotic expressions for unconditional MSPEs of the least 
squares predictors with estimated orders, we require a strengthened version 
of (5.20). The following lemma is given to fulfill this aim. 

Lemma 5. Let the assumptions of Proposition 2 and (K.5) hold. Then, 
for any q > 0, 

2 



(5.22) 



lim E\ max 

n-»oo \l<k<K r , 



|a n (fc) — a|| fl ^ 
L n (k) 



0, 



where L n (k) is defined in Proposition 2 of Section 2. 

Proof. Note that 

|&n(fc) — ajjjj _ j 
L n (k) 



(5.23) 



\k n (k)- a (k)\\ 2 R(k) -ka 2 /N 



L n (k) 



n-l 



j=K n 

A(k) +B(k) + C(k) + ~D(k) 



R(k) 



k^_ 



(Lnikyr 1 



L n {k) 



where 
A(fc) 

B(k) 

C(k) 
and 



n-l 



— X j( k ) e j+U 



j=K„ 



n-l 



K^ 1 (k)R(k)K^ 1 (k)-R- 1 (k) 
i / n— 1 



Z Xj(^)ej+i,fc J-R 1 (*)(]y : Z x J'( fc )( e j+i.fc - e i+i) ). 



n-l 



J2 x 'j( k )( e i+hk- e j+ i))R 1 ( k )\J^ J2 x i(*O e i+i 



j=K„ 



D(fc) 



n-l 



iV 



Z x #) e j+i 



R-^k) 



ka^ 
N ' 



28 



C.-K. ING AND C.-Z. WEI 



n-l 



— Xj(k)e j+lt t 

j = K n 



2q 



\\R-\k)R(k)R-\k)-R-\k)\\« 



By Lemmas 1 and 2 and Proposition 1, one has, for any q > and all 

l<k<K n , 

E\A(k)\ q < E< 
(5.24) 

<C 

E\B(k)\ q < E< 



k 2q 



71— 1 

j=K„ 
n-l 



(5.25) 



1 " x 

— 51 x i (A;)(e :?+ i )fe -e i+ i) 



WR'HkW 



<c 



fc 9 ||a-a(fc)|| 

iV9 



and 



E\C(k)\ q < El 



n-l 



N 



E x i(*O e i+i 



(5.26) 



j=K n 

-, n-l 



51 x i(*0( e i+i,* - e j+1 ) 



j = Kn 



\\R-\kW 



<C 



fcia-a(fc)j| 

AT9 



Now, according to (5.23)-(5.26) and Lemma 3, we have, for sufficiently large 



k=i 



a ||fl _ -. 

L n (k) 



(5.27) 



<^E 



V k 2q | feg||a-a(fc)l|^ | fc^ 2 



^\N^/ 2 L q n {k) NlL q n {k) NlLl{k) 



k= 



+ E^+ E ^ ]= (D, 



^ l AT<?/2 

=lV v fc=l n fc=fc*+l 

where the second inequality is ensured by 
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and NL n {k) > Cmax{fc,fc*}, and the equality is guaranteed by A;* — > oo. 
Consequently, (5.22) is ensured by (5.27) and Jensen's inequality. □ 



Under (K.l)(a) with i.i.d. Gaussian noise, Shibata ([27], Lemmas 4.2, 4.3 
and 4.4 and Proposition 4.1) established that, for all 1 < k,j < K n with 
1, 



K n <n 



(5.28) £|S, 2 - a 2 k - (S 2 - a 2 )\ A < CimaxikJ}) 1 / 2 N~ 2 \\ a (j) - a 



where Sf, = (1/N) Ylt=K e t+i fe> a C?) an< ^ a (^) are viewed as infinite-dimen- 
sional vectors and o\ = E(Si). [Since, by Remark 1, Di\\&(j) — a(fc)|| 2 < 
||a(j) — a.(k)\\ 2 R < D u \\a(j) — a(/c)|| 2 for some < D\ < D u < oo independent 
of j and k, — a(/c)|| 4 in (5.28) can be replaced by ||a(j) — a(fc)||)j.] 

However, his approach, based on heavy calculations for the cross moments 
of Gaussian random variables, is not easy to extend to higher-moment cases. 
Moreover, the term (max{£;, j}) 1 ^ 2 is cumbersome because it is difficult to 
infer how this term varies with the exponent on the left-hand side of (5.28). 
Lemma 6 below not only shows that this term is not needed, but also pro- 
vides a general bound valid for each q > 2. For some applications of Lemma 6, 
see Remark 6 and the proofs of Theorems 2 and 3. 



Lemma 6. Assume (K.l)(a), sup_ 00<t<00 E\et\ 2q < oo with q>2 and 
Kn < rt — 1 . Then 



(5.29) 

and for all l<k,j< K n , 



max E\Sl-a 2 k \ q <CN^ 2 , 

Kk<K n ' * fcl ~ 



(5.30) 



E\S 2 -a 2 - (S 2 - a 2 W < CN'^\\ a (j) - a 



i 

R- 



Proof. First observe that 



n-1 

E 

t=K n 



e t+l,k 



E(e 2 



Since, according to Remark 1 and the definition of & [given after (2.2)], 
e tj k, t = . . . , —1, 0, 1, ... , is a linear process, by Findley and Wei ([11], the 
first moment bound theorem) one has, for all 1 < k < K n , 



(5.31) 



1 I n— 1 n— 1 
E\£%-4\*<C—l Yl E [E(e i+ i >k e j+1>k )f 

> i=K n j=K n 



q/2 



<c 



1 

AT9/2 



E ef 



q/2 
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where ^ = E(e i+1>k e j+lik ). By Remark 1, 



' = — oo \ 



j=0 

(fc > 
EM fc )i 
3=0 



2 \ 2 

/(A)) 



where ao(k) = 1. By Berk ([4], Lemma 4), 

fc 

sup e < °°- 



0<fc<oo 



i=0 



This fact and (5.31) yield (5.29). 

To obtain (5.30), assume k <j. Then 

\ s k-°k- ( S j ~ a j)\ 

Y n—l 



(5.32) 



— E e t+i,k ~ l|a(j) - a(fc)||| - ef +lx 



< 



N 
1 

N 
+ 



t=K n 
n-1 



X] ( e m,fc - et+i,i)et+i,fc - ||a(j) - a 



-, n-1 



E ( e t+i,k - et+ij)e t+ i ! 



By Findley and Wei ([11], the first moment bound theorem) again, the 
expectation of the first term on the right-hand side of (5.32) is bounded 
by 

/ n-1 n-1 \ <?/2 

CN~ q [ J2 E E((e tl+hk - et 1+ i,j)(e t2+ljk - et 2 +ij))^*_t 2 ] 

\ti=K n t 2 =K n / 



q/2 



<CN-^ 2 {E(e hk -e hj ) 2 r/ 2 [ E < CN^Mj) - a 



9 



where the second inequality follows from E(e\^ — e±j) 2 = ||a(j) — a(/c)||^ 
and 

oo / k oo \ 2 

E ISI<tf(EK(*)lEMJ < a 

j=— oo V i=0 J =0 / 
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Similarly, an upper bound for the expectation of the second term on the 
right-hand side of (5.32) is also given by CN~ q / 2 \\a.(j) — a.(k)\\ q R . As a result, 
(5.30) holds. □ 



Remark 5. Assume {ej} in Lemma 6 is a sequence of martingale dif- 
ferences corresponding to an increasing sequence of cr-fields of events, {J~t}- 
Further, assume that 

E(e\\T t -x) = o 2 a.s. 
for £ = ... , —1, 0, 1, ... , and 

sup E(\e t \ 2q \F t ^)<C < co a.s. 

— oo<t<oo 

Then by the same argument as in Lemma 6, (5.29) and (5.30) are still valid. 



According to the decomposition of S n {k) given by (4.1) of [27], 
(5.33) 



S n (k) = NL n (k) + 2k(a 2 k - a 2 ) + (ka 2 - N\\a(k) - a(fc)||^ (fc) ) 



+ Na 2 + N(S 2 k -a 2 k ) 
Equality (5.33) yields 



p(k s n = k)< p(s n (k) < s n (k* n )) =p[ Jg® < SniK) 



(5.34) 



where 



NL n {k) ~ NL n (k) 



5 

<J2 P (\V tn (k)\>(l/5)V n (k)), 



V ln (k) 
V 2n (k) 
V 3n (k) 

VAn{k) 



2k(a 2 -a 2 ) 
NL n {k) ' 

2k* n {a 2 K -a 2 ) 
NL n (k) ' 
ka 2 -N\\k n (k)- a (k)\\l n(k) 



NL n (k) 

L n(&n) — 1 

NL n (k) 



kla 2 -N\\k n {kl)- a {kl)\\\ n{K) 
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and 



V n {k) 



L n (k) — L n (fc*) 
Ljk) ' 



By (5.34), Chebyshev's inequality and Lemmas 1-6, we can obtain an 
upper bound for P{k^ = k) through moment bounds for Vi n (k)/V n (k),i = 
1, . . . , 5. This is our motivation for verifying that, for any r > 0, 



(5.35) 



lim E 

n— >oo 



L n ( 



Ln{k*, 



1 



0: 



see Theorem 3 below for more details. Equality (5.35), which provides a 
(general) moment convergence result for L n (k^) / X n (fc*) , is the key to prov- 
ing Theorem 1, and can be applied to verify (5.74) and (5.75), which are the 
final steps in the proof of Theorem 2. As a byproduct, (5.35) also yields &„'s 
asymptotic efficiency for independent-realization predictions in the sense of 
Shibata [27], as defined in (4.11). To see this, first notice that, for any ran- 
dom variable I n E J n \J n is defined after (4.6)], 



E {(Vn+i -y n +i(L)) \xi, 
It is also not difficult to show that 



1 



max 

Kk<K r 



L n {k) 



< inf 

1 n 



\&-n{In) 



l^-n(^n) 



\R 



(5.36) 



< 



< 



inf 



L n {kn) 



|a n (/s* ) 



\R 



Ln(kn) 



Since by (5.20) both sides of (5.36) converge to 1 in probability, (4.11) can 
be rewritten as 



(5.37) 



L, 



L n (kn, 



l = Op(l) 



or 



\&-n(k T: 



\R 



Obviously 
(5.38) 



Ln{k s n ) 



L n {kn) 

is an immediate consequence of (5.35 



L n {kn) 



l = Op(l) 



1 



o P (K 



Theorem 3. Let the assumptions of Proposition 2 and (K.5) hold. Then 
(5.35) holds. 
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Proof. Let e > 0. By (5.34), 



E 



(5.39) 



Ln(k>n) _ , 

"|(^i->)'^ = « 

5 



1=1 lfcGA e . 



where 



(5.40) A e , n = {k:l<k<K n ,p^-l>e\. 

To obtain (5.35), it suffices to show that the value inside the braces of (5.39) 
is asymptotically negligible in the sense that 

(5.41) Jim £ ^M^-l) r p(\ Vm (k)\>(l/5)V n (k)) = 0, 



for i = 1, ... ,5. 

By (4.2) of [27] and some algebraic manipulations, 

|*jfe " ^1 < ||S«(*) " a ( fc )ll| (M + K*0 " a lll + \ S k ~ °k\ 

(5.42) 

< - RimWR-^m + l)||a n (fc) - a||| + |5 fc 2 - 4\. 

By Ing and Wei ([17], Lemma 2), (5.22), (5.29), (5.42) and Holder's inequal- 
ity, one has, for any q > and all 1 < k < K n , 

(5.43) mn(fc)r < C Ql + A r-^. 



Since 



and 



L n (k) f CL- l (k* n ), if 1 < < A;*, 



^(fc)^— forA:eA e , n ,, 
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by the Chebyshev inequality and (5.43), we have, for large q, 
L n {k) 



E 



1 P(\V ln (k)\ > (l/5)V n (k)) 



<c E 

,n 



L "( fc ) Vy-(g-r-) 



(5.44) 



1 + e 



r k* 
q—r I h n 



>(*) 



+ 



E 



fc 9 



+ 



x ,k* r W-r k* r Nl/ 2 -r 



Kn 

+ E 

k=k*+l 



k q+r ¥ 



N q Nil 2 



oil). 



Therefore, (5.41) holds for i = 1. Similarly, we also have, for any q > and 
all 1 < k < K n , 



(5.45) 



E\V 2n (k)\ q <C^ + N^ 2 



By (5.45) and the same argument used for verifying (5.44), (5.41) holds for 
i = 2. 

For i = 3, we have 



\V 3n (k)\ < 



1 



+ 



(5.46) 



L n (fc) 

\&n(k) - a(k)\\ 2 AAk) - \\a n (k) - a(fc)||| (fc) 



L n {k) 



< 



|a n (/c) a ||/j 



L n (k) 

\\R n (k)-R(k)\\\\R-Hk)\\\\k n (k)- a \\l 
L n (k) 

By Ing and Wei ([17], Lemma 2) and (5.23)-(5.26), we have, for any q > 
and all 1 < k < Kn , 



(5.47) 



£|^3n(*0| 9 <C 



k q kil 2 
+ 



Now, by taking a sufficiently large g 
L n (k) 



E 



Nil 2 NiL q n (k)J' 

1 

1 ) P(|^3»(*)| > (l/5)F n (fc)) 
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e.n 



Ln(K)J n y J \Ni/ 2 WL q n {k) 



^ ^f 1 + e Y~ r fs^f kq kQ/2 

- C i 2^ [ u* r ATo/2-r 4 



+ t f^ + O^ 

k=k*+l 
= 0(1). 

Consequently, (5.41) holds for i = 3. Similarly, we can also show that, for 
any q > and all 1 < k < K n , 

(5.48) ^ 4 „ W |,< C (_|_ + _g_), 

and, hence, (5.41) holds for z = 4. 

Since (K.3) is assumed, by (5.30) one has, for any q > and all 1 < k < K n , 

(5 . 49) E|F5 „ wl ,< c mz«. 

This gives for large q, 
' L n (k 
,L n (k, 



E (t^- 1 ) P(|%n(fc)| >(l/5)^(fc)) 



fc£A E 



< c /l + ey- r |^ ||a(fc)-a(fc*)||^ 



C 



, ll a a (^)llii [_ ||a a(fc*)||^ 



\ e / \k=l k=k*+l / 

= o(l). 

Hence, (5.41) holds for i = 5. Consequently (5.35) follows. □ 

Remark 6. In this remark we show that (5.38) can be directly verified 
[without the help of (5.35)] under much weaker conditions than those of 
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Theorem 3. [Recall that the main purpose of Theorem 3 is to provide a 
general moment bound for L n (k~) / L n {k n ) , and (5.38) is only its byproduct.] 
To see this, first notice that, by assuming (K.l)(a), k n — > oo as n — > oo and 
the Gaussianity of {et}, Shibata ([27], Proposition 4.1) proved that 

(5.50) max \V 5n (k)\ = o p (l) . 

\<k<K n 

As can be seen from [27] and [21], (5.50) and (5.20) are the two most impor- 
tant tools for verifying (5.38). However, by (5.30), (5.49) and the assumption 
that fc* — > oo as n — > oo, one has, for q > 2, 

J2mn(kW<j:{NL n (k* n )}-^ 2 + £ {NL n (k)}-«/ 2 = o(l), 
k=i fe=i k=k*+i 

which in turns implies (5.50). As a result, (5.50) still follows if the Gaussian 
assumption on {et} is replaced by sup„ 00<t<00 E\e t \ q < oo, q > 4. This fact, 
a result given before Lemma 5 [which shows that (5.20) can be guaranteed by 
a set of rather mild assumptions], (5.29) and the same argument as the one 
given m Theorem 4.1 of [27] together yield that (K.l)(b), (K.5), K n = o(n 1 / 2 ) 
and sup_ 00<t<00 E\et\ q < oo, q > 4, are sufficient to confirm (5.38). Recently, 
under model (1.1) with i.i.d. but non-Gaussian noise, Bhansali [5], Karagrig- 
oriou [19] and Lee and Karagrigoriou [21] also obtained (5.38). However, all 
these papers required a more stringent moment assumption, -Ejeil 9 < oo 
with q larger than or equal to 8; see Section 6 for more discussion. 

The following corollary deals with the moment properties of L n (k n ) with 
k n k n , k n , kn and k n . 

Corollary 2. Let the assumptions of Theorem 3 hold. Then (5.35) 
holds with k^ replaced by k^,k^,k n p or k^ . 

Proof. Define G n 1] (k) = S n (k) - N exp(XLC(k)) , G { n\k) = S n (k) - 

JV(FPE(fc)), Gi 3) (fe) = S n (k) - N(S p (k)) and {k) = S n (k) - C p (k). By 
arguments similar to those in Theorem 3 and Shibata ([27], Theorem 4.2), 
(5.41) still holds with |Vi„(/c)| replaced by 

\c$\k)-($\k*)\ 

NL n (k) 

j £ {1,2,3,4}, or with 1/5 replaced by any positive number independent of 
n. Viewing the proof of Theorem 3, the claimed properties are guaranteed 
by this finding. □ 

We are now in position to prove Theorem 1. 



PREDICTION IN AUTOREGRESSIVE PROCESSES 



37 



Proof of Theorem 1. First note that 



E{y n+1 - y n+1 (k n )) 2 - a 7 



(5.51) 



E n \k* : 



E 



E n {kn) 



+ E 



Ln(k*) / 



where k n = k^,k^,k^, k n p or k%. By (5.22), Theorem 3 and Corollary 2, 
the first expectation on the right-hand side of (5.51) converges to 0, whereas 
the second expectation converges to 1. As a result, Theorem 1 follows. □ 

To obtain Theorem 2, we still need the following two lemmas, Lemmas 
7 and 8. 

Lemma 7. Suppose that the assumptions of Proposition 2 hold. Then, 
for any q > 0, 



(5.52) 



lim El max 

n->oo \i<k<K n 



f(fc)-fi(A0 



0. 



where f(k) is defined after (2.3) and for n > y/n + K n + 1 and y/n > 2K n , 

. n—y/n—l 

f 1 (k) = ^(k)R- 1 (k)- 



j=K n 



With (k) — (x^, ■ ■ ■ , 3^— fc+i) — (J2r=0 br e n-n • ■ • )Er=0 b r £n-k+l- 



Proof. By (K.4) we can assume that n > yfn + K n + 1 and y/n > 2K n 
without loss of generality. To obtain (5.52), we first show that, for any q > 0, 



(5.53) 
where 



lim El max 

n-»OQ \l<k<K n 



f (A:) - fo(fc) 



L l J\k) 



0. 



.. n — y/n — 1 

fo (k) = x*' (A;) J?° (k) — J2 *j(k)e j+ltk 



j=K n 



with 



n— y/n— 1 



K(k) 



N 



E MkWj(k). 



j=K n 
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Set q > 2/3. By Proposition 1, Lemmas 1 and 2, Holder's inequality and 
an argument similar to that used for obtaining (3.13) of [17], 



(5.54) 



E\ max 

A<k<K. 



f(*0-fo(*0 



Li /2 (k) 

<C£L^ 2 (z) 

S(||x n (0-x;«||^) J B(|| J R- 1 (z)|| 3 «) 



A„ 



2=1 



x El 



n-l 



N- 1 x,(i)e J+ i, 

j=K n 



3q S 



1/3 



+ 



E(\K(i)\\^)E(\\R-\t) - R°-\i)\\ 



3q , 



X £ 



n-l 



j=A n 



3 9 N 



1/3 



+ 



E(\\ x um 3q )E(\\K 1 m 3q ) 



X £ 



3<n 



1/3 . 



n-l 

N- 1 x i(») e j+l,i 
j=n-y/n 

oo \ 5/2 

J>vV2-A„+l / 

+ i^N'^ + i<? AT- 31 ?/ 4 



By observing that L n q ^ 2 (i) < (i/iV) ? / 2 , the right-hand side of (5.54) is 
bounded by 



i=l 



K„ 



(5.55) 



i=l 



E 



9/2 



+i 39/2 iv -3 ? /4 + . g /2 iv - ? /4 



j>y/E/2-K n +l 



Moreover, since (K.l)(b) implies that n^,> n ^ 2 = o(l), by taking sufficiently 
large q, (5.55) converges to 0, and hence (5.53) holds for sufficiently large q. 
This result and Jensen's inequality further guarantee that (5.53) is valid for 
any q > 0. 
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In view of (5.53), (5.52) follows if one can show that, for any q > 0, 

f (Ao-fi(fc)r 



(5.56) 



lim E\ max 

n->oo \l<k<K r , 



0. 



By Proposition 1, Lemmas 1 and 2 and an argument similar to that used 
for showing (3.21) of [17], one has, for sufficiently large q, 



E\ max 

A<k<K n 



fo(fc)-fl(fc) 



L\l 2 {k) 



x < g £ ,ft(*)-fi(i) 



i=l 



Ln /2 (i) 



i=l 



which, together with Jensen's inequality, yields (5.56). □ 



Lemma 8. Assume (K.l)(a), sup„ 00<t<00 -E|et| 2<? < oo, q > 1, and K n 
o(n 1 / 2 ) . T/ien ; /or n > y/n + K n + 1 and Ifn > 2K n , 



(5.57) 



Proof. Without loss of generality, assume that 1 < i < I < K n . First 
observe that 

1 n— s/n— 1 

f 1 (f)-f 1 (/) = x;'(f) J R- 1 (0^ E te(0-*;(0te+i 



n— 1 



+ ( x ;(z)-x;(0) / ^ 1 (0^ E x i(0e j+ i 



(5.58) 



j=K n 



+x; i '(i)( J R- 1 «- J R- 1 (/))- E ^'W^+i 



(I) + (II) + (III), 



where x*(i), Xj(i) and i? _1 (z), respectively, are regarded as /-dimensional 
vectors and an I x / matrix with undefined entries set to 0. To obtain (5.57), 
it suffices to show that, for all 1 < i < I < K n , 



E\(T)\ 2q <C 



N 



for all T = (I), (II) and (III). (Note that as mentioned before Proposition 1, 
C is used to denote some positive number independent of i,l,K n and n.) 
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Since x* (i) is independent of the remaining part of (I), by an argument 
similar to that used for showing (5.17), one has, for all 1 < % < I < K n , 



E(\(l)\ 2q )<CE 



X E -x i (/))e i+ i 



(5.59) 



j=K„ 

<C\\R^(l)\\ 2q \\Ri n \i)\\ q 



x E 



2q 



R- 1 (l)R^ (i)-^ -1 (0 



2g 



<C 



E ( x iW - x i(0)ej+i 

j=K„ 
l-i\ q 



N 



where Ri n \i) = i?(x* (i)x*'(i)) and the last inequality follows from Lemma 2 
and 

max \\R^\k)\\ <C, 

l<k<K n 

which is guaranteed by 1^1 < °°- Similarly, for all 1 < i < I < K n , 



E(\(lI)\ 2q )<CE 



— J2 x i(0ej+i 



3=K„ 



2q 



where 



and 



< CE 



D n (l,i) 



zJ n) (Z,z)e i+ i 



2q 



N 



j=K n 







(l—i)xi 



Ojx(Z-i) 

Ri n) (l-i) 



-f (M) = (Ci(M),---,^(M))' 

= (o a _ 4)XJ ,^ n)1/2 a-0) J R- 1 (Ox J (0- 

By an argument like that given in Lemma 4 of [17], we have, for all 1 < i < 



E 



^ n— \/n— 1 

E 



i=K n 



<C 



l-i\ q 1 



2g 



1 n~y/n.-l \ 
1 ^ y ( n )/'7 ,AI 2, 2 



E ^ E W 

r=l\ j=K n 



) 7 I 5 
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and for all 1 < i < I < K n , all 1 < r < I — i and all K n < j <n — y/n — 1 , 

E\z$(l,i)\ 2Q <C. 



As a result, for all 1 < i < I < K n , 



(5.60) 



E(\(U)\ 2q )<C 



N 



follows. 

Reasoning as for (5.59), 

E(\(III)\ 2q ) 

(5.61) 

n— y/n— 1 



<CE 



— x iW e i+i 



j=K r 



2q 



(B-i (i)-flr 1 (i)(.R-i (O-Br 1 ^)) 



i?(0 



holds for all 1 < i < I < K n , where i?^~ 1 (/) is the upper left i x i block of 
i? _1 (Z), and x,-(i) and i? _1 (z), returning to their own original definitions, 
are an i-dimensional vector and an i x i matrix, respectively. If we write 

R{i) Ri,i-i(l)\ 

Ri-iA l ) -*')/' 

then 

(5.62) i?- 1 ^) = (R(i) - R t ^S)R~\l ~ iJi^-i.iCO)" 1 - 
From (5.62), 

(5.63) iT 1 ® - i?r x (/) = -^ 1 (i) J R M _ i (0^ 1 G " ^Ri-MR^il). 

On substituting (5.63) into the right-hand side of (5.61), an upper bound 
for the left-hand side of (5.61) is given by 

(5.64) C\\M n (l,iWE(\\n n (l,i)\\ 2 % 
where 

M n (l,i) = R l ^ l (l)R- 1 (i)Ri n) (i)R- 1 (i)R hl _ i (l) 

and 

^ n—y/n—l 

u n (l,i) = R~ 1 (l-i)R l _ lyl (l)Rr l (l)- ]T Xj (i)e i+ i. 



Observe that, for all 1 < i < I < -fT n , 

||M n (i,»)|| < ll^ 1 / 2 ^)^^)^" 1 / 2 ^)!!!!^,^/)^" 1 ^)^ 

< C(||i?(Z - *)|| + W - - Ri-i^R'WRij- 

=c(iin(i-.-)ii + ii(^ -(or 1 ii) 

<c, 
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where the second inequality follows from the boundedness of 
maxi<i<K n ||J?i n \i)||, RT^^-il) is the lower right (I — i) x (I — i) block of 
and the last inequality is ensured by the fact that, for all 1 < i < I < 



||( J R^ ) _(0) -1 |l<Amax(i?(^n))<C ) 



where, for a symmetric matrix A, \ mayL (A) denotes its maximum eigenvalue. 
As a result, 



(5.65) 



max \\M n (l,i)\\<C. 

l<i<l<k n 



Moreover, by arguments similar to those used for showing (5.60) and (5.65), 
one has, for all 1 < i < I < K n , 

'l-i\ q 



E{\\u n {l,i)f«)<C 



N 



which, together with (5.61), (5.64) and (5.65), yields that, for all 1 < i < I < 



N 



E(\(lll)\ 2q ) < C 
This completes the proof of Lemma 8. □ 
We are now ready to prove Theorem 2. 

Proof of Theorem 2. By Holder's inequality, one has, for 1 < r < cxd, 



E 



UK 



(5.66) 



fc=l v 



2g 



h{k)-h{K) 



2qr\ 1/r 



{P{k S n=k)} 



(r-l)/r 



Set < f < min{l/2,5i/2}. [Recall that £ is defined in (K.6) and 6i is 
defined in (K.4).] Since (K.6) is assumed, there is a nonnegative number 
6 = 0(£) such that (3.2) is fulfilled. By (3.2) and Lemma 8, the right-hand 
side of (5.66) is bounded by 



c E 



(5.67) 



V. k=l 

H A n,6 



NL n (k) 



+ E 

k=i 



k-kl 



kf~ 1)q K e + J2 

k=l 



Kn 



NL n {k) 



k-k* 



{P(K=k)} 



(r-l)/r 



NL n (k) 



{P{k s n =k)} 



(r-l)/r 
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where A n fi is defined in (K.6). First observe that, for sufficiently large q, 

the first term on the right-hand side of (5.67) converges to 0. In addition, by 

analogy with (5.39) and the fact that for a, b > 0, (a + 6)( r ^ 1 )/ r < a^" 1 )^ + 
k(r-l)/r 



(5.68) 



E 

k=l 

k£A n , 



k k r 



NL n (k) 



{P(k S n =k)} 



(r-l)/r 



K„ 



<E E 



i=i k k=i 



k k r] 



{P(\V in (k)\ > (l/5)V n (k))} 



(r-l)/r 



NL n (k) 

By (3.2), (5.43) and (5.45), one has, for sufficiently large n and q, 



E 

fc=i 



k k n 



NL n (k) 



{P(\V ln (k)\>(l/5)V n (k))}^ l)/r 



(5.69) 



and 



= o(l) 



Kn 



E 

k=l 



k k r 



NL n {k) 



{P{\V2n{k)\>{l/h)V n {k))f- 1)/r 



(5.70) 



k=l 



o(l). 



According to (3.2) and (5.47), one has, for sufficiently large n and q, 



Kn 



E 

fc=i 

k€A„o 



k-k* 



NL n {k) 



<ckf E(i^ + 



{P(|^(fc)| >(l/5)K(A:))} ( ^ 1)/r 



(5.71) 



§7^ + E^ 9/ + E *" ff/2 
fc=i fc=fc*+i > 



44 



C.-K. ING AND C.-Z. WEI 



= o(l), 

where the last equality is ensured by < £ < min{l/2, Si/2}. Similarly, by 
(3.2) and (5.48), 



(5-72) Yl 



k=l 
fc6A n a 



k k~ 



NL n (k) 



{P(\VUk)\ > (l/5)V n (k))} 



(r-l)/r 



0(1), 



provided q is sufficiently large. Finally, by (3.2) and (5.49), one has, for 
sufficiently large n and q, 



E 

k=i 

fe€-A n i 



iVL^fe) 



{P(|F 5n (fc)| >(l/5)V r „(fe))} 



(r-l)/r 



^ r ;.»fafe ||a(fc)-a(Aft)|fe 



(5.73) 



<c*f E fc » + E fc " 9/2 
\fc=i fe=fc*+i / 

= 0(1). 

In view of (5.69)-(5.73), for sufficiently large q, the left-hand side of (5.68) 
converges to 0. Consequently, the left-hand side of (5.67) also converges to 
for sufficiently large q. This result, Jensen's inequality and (5.66) yield 



lim E 

n— >oo 



fi(fc£)-fi(A£) 



Lj (k%) 



2<2 



0, 



for any q > 0. Moreover, by Theorem 3 and Lemma 7, 

|f(^)-f(fc* 



(5.74) 



lim E 1 

n— >oc 



^)| 2 " 



Ln.(k n ) 



0. 



On the other hand, by Wei ([29], Lemma 2) it can be shown that 
E|5(fc)-5(fe;)|^<C||a(fe)-a(fe:)||^. 
By the definitions of L n (k) and &* , it is easy to see that 



N 



a 



\\ a (k)- a (k*J\\^<{L n (k)-L n (k* n ) + 

These facts and an argument similar to that used for verifying (5.74) yield 

\S(k%)-S(k*)\^ 



(5.75) 



lim E 

n— >oo 



L n (k n ) 



0. 
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As a result, (3.8) with k n = k^ follows from (5.74), (5.75), Proposition 2 
and the Cauchy-Schwarz inequality. Moreover, by arguments similar to those 
used for showing (5.74), (5.75) and Corollary 2, (3.8) also holds with k n = 

k n , k n , kn or k n . D 

Proof of Corollary 1. The proof of Corollary 1 is similar to that 
of Theorem 2. The details are omitted. □ 

6. Discussion and concluding remarks. (1) Main contributions. Due to 
its success in practical applications, AIC has received considerable attention 
among researchers (and practitioners) from various disciplines in the past 
three decades; see [10]. However, the statistical properties of AIC are still 
not clear when it is used for forecasting the future of an observed time se- 
ries. In the present article we have attempted to resolve this problem. Armed 
with some new technical tools, Theorem 2 and (4.4) successfully show how 
well AIC (and its variants) can work for same-realization predictions. The 
simulation results given in Table 1 also show that the finite-sample perfor- 
mance of AIC is satisfactory in many practical situations. Corollary 1 and 
(3.12)-(3.14) explore the prediction efficiencies of some other AlC-like cri- 
teria having different penalties for the number of regressors in the model. 
By the same argument used for proving Theorem 2 and the ideas of Shibata 
([27], Section 5), Bhansali [5] and Ing and Wei ([17], Section 4), extending 
Theorem 2 and Corollary 1 to the multistep prediction case is straightfor- 
ward. 

On the other hand, Table 2 indicates that for same-realization predic- 
tions it seems very difficult for AIC to be strongly asymptotically efficient. 
This phenomenon not only points out another dissimilarity between same- 
and independent-realization predictions [since AIC is strongly asymptot- 
ically efficient for independent-realization predictions; see (4.10)], it also 
inspires a new direction for time series model selection, that is, selecting 
models (or orders) through the second-order conditional MSPE, namely, 
E[(x n+ i — x n+ i(k)) 2 \xi, . . . ,x n ] — a 2 . As suggested by Table 2, a model hav- 
ing the minimal second-order conditional MSPE can be asymptotically much 
more efficient than the model selected by AIC in the sense that r* 3> 1 for all 
large n. [Recall that r* is defined after (4.8).] Unfortunately, since E[(x n+ i — 
x n+ i(k)) 2 \xi, . . . ,x n ] — a 2 is unobservable, it cannot be used as a selection 
criterion in practice. However, we conjecture that a model selection criterion 
based on a reliable estimator of E[{x n+ \ — x n+ i{k)) 2 \xi, . . . , x n ] — a 2 should 
also outperform AIC for same-realization predictions. For a related discus- 
sion on estimating E[{x n+ \ — x n+ i(k)) 2 \xi, . . . ,x n ] — a 2 in finite-order AR 
models, see [17]. 

(2) Moment restrictions. For independent-realization predictions, we pro- 
vide a set of sufficient conditions in Remark 6 which guarantees that AIC 
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achieves Shibata's asymptotic efficiency in non-Gaussian settings. Since Re- 
mark 6 only assumes that the error distributions have uniformly bounded 
(4 + <5o)th moments, where <5o is any (small) positive number, it notably 
improves the best previous result given by Lee and Karagrigoriou ([21], 
Theorem 3.1), which required the existence of the eighth moment. However, 
to ensure that AIC is asymptotically efficient in the senses of (4.3) and (4.4), 
(K.3) is needed; see Theorems 1 and 2. This is because (2.4) and (2.5) are 
required to hold for sufficiently large q in the proofs of Theorems 1 and 2; 
and Proposition 1, which is the first result giving sufficient conditions [in- 
cluding (K.3)] such that (2.4) and (2.5) are fulfilled for any q > 0, is used to 
meet this requirement. Although (K.3) is rather stronger than is necessary, 
it is convenient. Note that if (K.4) is replaced by a stronger assumption than 
K® +s ° = 0(n) for some 5q > 0, then the first part of Theorem 2 of [17] holds 
instead of Proposition 1. By applying this result to our analysis, (K.3) in 
Proposition 2 and Theorem 1 can be weakened to sup_ 00<t<00 -E|et| 20 < oo 
and sup_ 00<t<00 E\e t | 36 < oo, respectively. But for Theorem 2, (K.3) needs 
to be replaced by a more complicated moment restriction which may depend 
on the value of 6 [which is defined in (K.6)]. Consequently, while (K.3) can 
be slightly relaxed at the price of reducing the number of candidate models, 
the disadvantages seem to outweigh the merits. To resolve this dilemma, 
it is necessary to verify (2.4) and (2.5) under milder moment conditions. 
However, this topic is beyond the scope of the present article. 

(3) Extensions to the regression model. As mentioned in Section 1, the 
main difficulty of analyzing AIC's (and its variants') same-realization MSPE 
in time series models lies in the fact that future observations, estimated pa- 
rameters and selection criteria are all stochastically dependent and, hence, 
Shibata's approach is no longer applicable. For the regression model, how- 
ever, the (commonly used) assumption of independent observations yields 
that the future observations are independent of the estimated parameters 
and selection criteria even in the same-realization case, which substantially 
simplifies the task of analyzing the model selection criterion's (same-realization) 
MSPE. This is exactly the same situation encountered in independent- 
realization predictions for time series models (see Section 1). In fact, un- 
der a Gaussian regression model with infinitely many parameters, Breiman 
and Freedman [7] showed that the S p is asymptotically efficient for "same- 
realization" predictions from a conditional MSPE point of view. [Note that 
their asymptotic efficiency is the same as the one discussed in (4.11).] It also 
can be shown that their result still holds with S p replaced by AIC, FPE, 
C p or S n (k). In addition, we conjecture that their result can be extended to 
unconditional versions without the Gaussian assumption, provided suitable 
smoothness conditions [such as (K.2)] on the distributions of the (random) 
regressors and the white noise are imposed. 
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(4) The possibility of extensions to the multivariate case. Since for mul- 
tivariate time series, AR models are the most used models by far, order 
selection for vector AR models has attracted growing interest among re- 
searchers from various disciplines in recent years. For example, Findley and 
Wei [12] recently presented the first mathematically complete derivation of 
the multivariate AIC for comparing vector AR models fit to stationary series 
in independent-realization settings. When the candidate vector AR models 
are misspecified, Schorfheide [23] also considered order selection problems 
for the purpose of independent-realization predictions. However, since these 
results focus on independent-realization cases, it would be of interest to ex- 
tend our same-realization results to the multivariate case. To achieve this 
goal, extending Proposition 1 to stationary multivariate time series is neces- 
sary. Taking the approaches used to verify Theorem 4.1 of [12] [which gives 
a multivariate version of (2.4) but with K n fixed with n] and Theorem 2 
of [17], a multivariate extension of Proposition 1 can be easily obtained. 
In addition, generalizations of the moment bounds of Section 5 to the vec- 
tor case are also required for establishing the desired results. However, since 
these generalizations are not straightforward, further investigation along this 
direction is needed. 
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Proof of (3.4). We first show that, for sufficiently large n, 

(A. 1) - log N - C log 2 N < k* n < - log N + C log 2 N, 

where C is some positive number and (3 is defined in (3.3). Let k = k* n — d, 
where d is some positive integer. Then 

(A.2) L n (k) - L n (k* n ) = ||a - a (k)\\ 2 R - ||a - a (k* n )\\% - ±a 2 > 0. 
According to (A.2) and (3.3), one has, for some C > 0, 

where 0\ is defined in (3.3). Taking the (natural) logarithm of both sides, 
we get 

(A.3) fc;<ilogiV + Clog 2 iV, 

for some C > 0. In view of (A.3) and (K.4), we have + k^ < K n for any 
< i] < 1 and for sufficiently large n. Now let k = fc* + k^ for some < rj < 1. 
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(Here we assume without loss of generality that k^ is an integer.) In this 
case 

(A.4) L n (k) - L n (k* n ) = k ^a 2 - (||a - a(A£)||| - ||a - a(fc)|||) > 0. 

By (3.3), (A.4) and the fact that /c* — > oo, one has, for sufficiently large n 
and some C > 0, 

and, hence, 

(A.5) A£>±logJV-Clog 2 JV. 

Consequently, (A.l) follows from (A. 3) and (A.5). 

To show (3.4), first assume that k < k* n — k* n for some < rj < 1. By (3.3) 
one has, for sufficiently large n and some C > 0, 

L n (k) - L w (fc* ) _ ||a - a(fc)|fe - ||a - a(fc*)|fe - ((fc* - fc)/AQq 2 
(k*-k)/N (K~k)/N 

(A.6) 

cfc*~ fll e-^ fc "- fc ""> 2 

- A;*/iV 

In view of (A.l), the right-hand side of (A.6) diverges to infinity and, hence, 

(3.4) holds for k<k* n -k^. 

For k > fc* + k^ , < r\ < 1, one has 

L n (fc) - L w (fc*) > L w (fc) - L n (fc* + (l/2)fc*") 
(k-k*)/N ~ (k-k*)/N 

^a 2 ||a-a(fc* + (l/2)Q||* 

By (3.3) and (A.l), the second term on the right-hand side of (A. 7) converges 
to 0. Therefore, (3.4) holds for k>k* n + k* v . □ 

Proof of (3.7). Let k = k* n — d, where d is some positive integer. By 

(3.5) and (A.2), 

(A.8) ^ < (c 4 + M 1 (k* - d)-*)(k* - dy? - (c 4 - m.k' 1 )k*- p . 

By Taylor's theorem, (A.8) can be further expressed as 

^<c i( 3dK 1 - fJ + o(K 2 -"). 
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Therefore, for sufficiently large n and some C > 0, 

>k*-C. 



a 2 \ -1/(13+1) 



NC 4 /3 

Since > 1 + 5x, we can choose a positive integer, d, such that fc* + (i < .ff n 
for all large n. By letting k = k* n + d and by an argument analogous to that 
used in (A. 8), we have, for sufficiently large n and some C > 0, 

2 \ -1/03+1) 

<K + c. 



NCil3) 



a 2 N -1/03+1) 



As a result, 

(A - 9) fc: = feJ + 0(1) 

Armed with (A. 9), the proof of (3.7) is divided into four cases. 

Case 1. 1 < k < #2^n> where < 62 < 1 is chosen to satisfy 8^ — 1 > /?• 
By (3.5) and (A. 9), one has 

L n (k) - L n (k* n ) = ||a - a(fc)||| - ||a - a(<)|& " ^^* 2 
>||a-a(«)|||-||a-a(^)|||-^ 2 



^ 2 (^-l-/3 + o(l)). 



Therefore, 



holds for sufficiently large n, 1 < k < 62^, and some C > 0. 

Case 2. 6>2&* < A; < /c* , where #2 is defined as in case 1. 
By Taylor's theorem, (3.5) and the assumption that £1 > 2, 

L n (k) - L n {k* n ) 



(K - k)/N 
N 



£d + /?(/? + i)d 2 



(A.ll) 



2 
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{3((3 + l)([3 + 2)d 3 



+ 



6k* 



k*-"(c, + o(k* n - ei )) 



N 



d\ 1 f C 4 /3(i C 4 d 2 f3jp,o 2 d 



N 



L*/3 + l + h. 
K n K, 

1 



,2+/3 



N 



<? z + o(K 



where d = A;* — k, < a n < dk^ < 1 — 62 and 

[3 + 1 (p + l)(J3 + 2)d 



+ 



2 ' 6*£(l-a n )0+ 3 ' 
In view of (A. 9) and (A. 11), we have 



(A.12) 



(K-k)/N = g(*M»+Op» 



for sufficiently large n and some C > 0, provided d> Cq with some C6 > 0. 

Case 3. (1 + 2 )fc* < fc < lf ft , < 2 < 1. 
By (3.5) and (A.9), 

Ln(fc) - L n (k* n ) = ±a 2 - C,K\l - (1 + {d/K))~ P ) 



(A.13) 



-tl-0, 



d k, 



n 2 



N (5N 



a\l-{l + {d/k* n )T P ){l + o{l)) 



+ o{ki h -% 

where d = k — fc*. By (A.13) one has, for sufficiently large n, 



L n (k) L n (kn) _ £J 2| ^ 



1 



(3(d/K 



(k-k*)/N 

(A.14) 

> C>0, 

where the last inequality follows from 



ii-(i+(d/k* n )r p )(i+o(i)) 



0(1) 
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and 



for x > 0. 



g(x) = l-^-(l-(l+x)~f 3 )>0, 



Case 4. k* < k < (1 + 6 2 )k* n with < 6 2 < 3/(/3 + 2). 
By Taylor's theorem, (3.5), (A. 9) and the assumption that £i > 2, one 
has, for sufficiently large n and some C > 0, 

L n (k) — L n (/c*) 

MP 



(A.15) 



d 2 



^(l + O^-WD)) 



/3d /?Q3 + l)d 2 

h* 



+ 



j 9( f 5 + l){[3 + 2)d 3 



6k* 



(l + a r 



-(3-3 



a 



(J3 + l)d (/3 + l)(/3 + 2)d 2 



(i + an )-^ + o{kD 



2k* 



6k* 



h* 
n 

provided d = k — k* n > C§ for some Cq > 0. Here a n is some positive num- 
ber which satisfies < a n < d/k^, and the last inequality follows from the 
condition on 82. 

Consequently, the desired property (3.7) is ensured by (A. 10), (A. 12), 
(A.14) and (A.15). □ 
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